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Abstract 


Given  oracle  access  to  some  boolean  function  f,  how  many  queries  do 
we  need  to  test  whether  f  is  linear?  Or  monotone?  Or  whether  its  output 
is  completely  determined  by  a  small  number  of  the  input  variables?  This 
thesis  studies  these  and  related  questions  in  the  framework  of  property  testing 
introduced  by  Rubinfeld  and  Sudan  (’96). 

The  results  of  this  thesis  are  grouped  into  three  main  lines  of  research. 

I.  We  determine  nearly  optimal  bounds  on  the  number  of  queries  required 
to  test  k -juntas  (functions  that  depend  on  at  most  k  variables)  and  k- 
linearity  (functions  that  return  the  parity  of  exactly  k  of  the  input  bits). 
These  two  problems  are  fundamental  in  the  study  of  boolean  functions 
and  the  bounds  obtained  for  these  two  properties  lead  to  tight  or  im¬ 
proved  bounds  on  the  query  complexity  for  testing  many  other  properties 
including,  for  example,  testing  sparse  polynomials,  testing  low  Fourier 
degree,  and  testing  computability  by  small-size  decision  trees. 

II.  We  give  a  partial  characterization  of  the  set  of  functions  for  which  we 
can  test  isomorphism — that  is,  identity  up  to  permutation  of  the  labels  of 
the  variables — with  a  constant  number  of  queries.  This  result  provides 
some  progress  on  the  question  of  characterizing  the  set  of  properties  of 
boolean  functions  that  can  be  tested  with  a  constant  number  of  queries. 

III.  We  establish  new  connections  between  property  testing  and  other  areas 
of  computer  science.  First,  we  present  a  new  reduction  between  testing 
problems  and  communication  problems.  We  use  this  reduction  to  obtain 
many  new  lower  bounds  in  property  testing  from  known  results  in  com¬ 
munication  complexity.  Second,  we  introduce  a  new  model  of  property 
testing  that  closely  mirrors  the  active  learning  model.  We  show  how  test¬ 
ing  results  in  this  new  model  may  be  used  to  improve  the  efficiency  of 
model  selection  algorithms  in  learning  theory. 

The  results  presented  in  this  thesis  are  obtained  by  applying  tools  from  var¬ 
ious  mathematical  areas,  including  probability  theory,  the  analysis  of  boolean 
functions,  orthogonal  polynomials,  and  extremal  combinatorics. 
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Introduction 
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Chapter  1 
Overview 


This  thesis  is  concerned  with  testing  properties  of  boolean  functions.  The  subject  of  this 
topic  is  best  introduced  with  an  illustrative  example. 


1.1  Black  boxes 

Imagine  that  we  are  given  some  box  with  n  switches  (each  of  which  can  be  either  “on”  or 
“off”)  and  a  lightbulb  that  lights  up  on  some  of  the  configurations  of  the  switches.  This 
box  can  be  said  to  compute  a  boolean  function  /  :  {0, l}n  — >  {0,1},  where  we  define 
f(x i, . . . ,  xn)  =  1  iff  the  bulb  is  lit  when  we  set  switch  i  to  the  “on”  position  iff  Xi  =  1 
for  i  —  1,  2, . . . ,  n. 

The  first  thing  we  might  want  to  do  with  such  a  box  is  to  figure  out  what  function  it 
computes.  If  we  don’t  want  to  open  up  the  box,  the  only  way  we  can  do  this  is  to  go  through 
all  2n  configurations  of  the  switches  and  see  for  which  configurations  the  light  is  on.  Even 
if  we  are  given  a  user’s  manual  that  purports  to  identify  the  function  computed  by  the  box, 
verifying  that  the  user’s  manual  is  correct  still  requires  checking  all  2"  configurations. 

Imagine  now  that  we  are  again  given  the  user’s  manual  that  describes  the  function 
computed  by  the  box  and  that  the  user’s  manual  also  has  one  extra  promise: 

Due  to  manufacturing  limitations,  some  boxes  may  not  behave  as  described 
in  this  manual.  However,  the  manufacturing  process  guarantees  that  faulty 
boxes  will  disagree  with  the  table  above  on  at  least  10%  of  the  possible  con¬ 
figurations. 
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With  this  promise,  we  can  now  test  the  box  much  more  efficiently.  To  do  so,  we  simply 
verify  that  the  box  is  consistent  with  the  definition  in  the  user  manual  on  100  different 
switch  configurations  each  chosen  at  random.  When  the  box  is  correct,  it  will  agree  with 
the  user’s  manual  definition  on  all  the  configurations.  When  the  box  is  faulty,  the  manufac¬ 
turer’s  promise  guarantees  that  for  each  such  configuration,  the  box’s  output  will  disagree 
with  the  user’s  manual  with  probability  So  the  probability  that  the  box  is  faulty  but 
passes  all  of  the  tests  is  (1  —  ^)100  <  20^00 • 

This  example  is  elementary  and  the  result  not  too  surprising,  but  it  hints  at  a  rather 
remarkable  truth:  for  some  decision  problems,  there  may  be  “approximate”  formulations 
of  the  problem  that  are  extremely  easy  to  solve.  (In  this  case,  the  decision  problem  was  the 
function  identity  testing  problem,  and  the  “approximation”  aspect  was  the  “gap  promise” 
that  any  function  that  did  not  satisfy  the  identity  condition  was  “far”  from  satisfying  it.) 

The  generalization  can  again  be  formulated  in  terms  of  our  example.  Instead  of  identi¬ 
fying  the  function,  we  may  receive  a  user’s  manual  that  only  describes  some  of  the  proper¬ 
ties  of  the  function  computed  by  the  black  box.  For  example,  we  may  read  the  following: 

Congratulations  on  buying  the  ACME  JuntaBox!  The  JuntaBox,  like  our  tra¬ 
ditional  box,  contains  n  switches.  For  your  convenience,  however,  we  have 
built  the  JuntaBox  in  a  way  that  the  behavior  of  the  light  is  entirely  controlled 
by  at  most  10  of  the  switches. 

This  user’s  manual  does  not  identify  the  function  computed  by  the  box,  but  it  makes  a 
strong  claim  regarding  how  this  function  should  behave.  As  before,  if  we  want  to  verify 
that  the  claim  is  correct,  we  may  have  to  check  all  2”  switch  configurations.  Once  again, 
however,  the  testing  task  may  be  significantly  simplified  by  a  promise: 

Due  to  manufacturing  limitations,  some  JuntaBoxes  may  not  behave  as  de¬ 
scribed  in  this  manual.  However,  the  manufacturing  process  guarantees  that 
faulty  boxes  will  disagree  with  any  valid  JuntaBox  on  at  least  10%  of  the 
switch  configurations. 

Unlike  in  the  function  identity  problem  discussed  earlier,  it  is  not  immediately  clear  whether 
we  can  make  use  of  this  promise  to  create  a  very  efficient  test  for  the  “junta”  property 
promised  in  the  user’s  manual.  At  the  very  least,  the  naive  strategy  of  checking  random 
switch  configurations  does  not  appear  to  be  very  helpful  since  in  this  case  we  don’t  have 
a  single  target  function  to  test  against.  Nevertheless,  as  we  will  see  in  Chapter  5,  the 
promise  is  indeed  very  helpful,  and  under  this  promise  we  can  test  the  junta  property  very 
efficiently. 
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1.2  Property  testing 


The  function  identity  and  junta  testing  problems  described  in  the  last  section  are  only  two 
of  many  properties  that  one  may  want  to  test  on  black  boxes.  For  example,  we  might 
want  to  test  if  the  box’s  function  is  linear,  if  it  is  monotone,  or  if  it  is  computed  by  a 
small  boolean  circuit.  The  testing  problem  (and  the  “approximate”  decision  model)  for 
all  these  properties — and  many  more — can  be  described  in  the  property  testing  framework 
first  introduced  by  Rubinfeld  and  Sudan  [90].  A  thorough  presentation  of  this  framework 
is  presented  in  Chapter  3.  Let  us  also  offer  a  quick  and  informal  introduction  to  the  frame¬ 
work  here. 

A  property  V  of  boolean  functions  (0,  l}n  — *  (0, 1}  is  a  subset  of  all  these  functions.1 
The  function  /  has  property  V  if  f  G  V.  Conversely,  we  say  that  the  function  /  is  e-far 
from  V  if  |{x  G  {0,  l}n  :  f(x )  ^  g(a;)}|  >  e2n  for  every  g  G  V. 

A  q-query  e-tester  for  V  is  a  randomized  algorithm  A  that,  given  oracle  access  to  some 
function  /  :  (0,  l}n  — >  (0, 1},  queries  the  value  of  /  on  at  most  q  elements  from  (0,  l}n 
and  satisfies  two  conditions: 

1.  When  /  has  property  V,  A  accepts  /  with  probability  at  least  |. 

2.  When  /  is  e-far  from  V,  A  rejects  /  with  probability  at  least  |. 

The  query  complexity  of  the  property  V  is  the  minimum  value  of  q  for  which  there  is 
a  q-query  e-tester  for  V.  A  principal  goal  in  property  testing  is  to  determine  the  query 
complexity  for  all  natural  properties  of  boolean  functions. 


1.3  Results 

In  its  most  general  form,  the  goal  of  determining  the  query  complexity  for  all  properties 
of  boolean  functions  is  unattainable  with  the  current  (mathematical  and  algorithmic)  tools 
at  our  disposal.  In  order  to  make  some  progress  on  it,  therefore,  we  must  restrict  our 
attention  to  some  class  of  properties  of  boolean  functions.  Ideally,  we  would  like  this 
class  to  include  all — or  almost  all — the  properties  of  boolean  functions  that  we  consider 
“natural”. 

'While  we  will  generally  focus  on  “natural”  properties  that  can  be  described  easily,  the  definition  applies 
equally  well  to  any  arbitrary  set  of  functions. 
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In  this  thesis,  we  restrict  our  attention  to  the  class  of  properties  of  boolean  functions 
that  are  closed  under  the  relabeling  of  the  input  variables.  This  class  includes  most  of 
the  properties  of  boolean  functions  that  we  may  consider  “natural”,  including  linearity, 
representability  by  low-degree  polynomials,  symmetric  functions,  juntas,  representability 
by  small  decision  trees  or  by  small  circuits,  monotonicity,  submodularity,  halfspaces,  low 
Fourier  degree,  etc. 

While  these  properties  have  been  extensively  studied,  three  main  questions  remain 
largely  open.  First,  what  is  the  exact  query  complexity  of  these  properties?  When  we 
began  this  research,  there  were  large  gaps  between  the  best  upper  and  lower  bounds  for 
the  query  complexity  of  most  of  these  properties.  Our  results,  as  covered  in  Chapters  5-7 
improve  the  upper  and  lower  bounds  for  a  number  of  these  properties  and  close  a  number 
of  these  gaps. 

The  second  question  that  remains  largely  open  is:  why  are  some  properties  testable 
efficiently?  Or,  in  other  words,  what  characteristics  of  properties  are  necessary  or  suffi¬ 
cient  to  guarantee,  for  example,  that  the  query  complexity  of  a  property  is  independent 
of  the  domain  size  of  the  input  functions?  Over  the  last  few  years,  there  has  been  a  lot 
of  progress  on  this  question  when  restricted  to  the  class  of  algebraic  properties  of  func¬ 
tions  [18,  72,  95] — i.e.,  on  properties  of  boolean  functions  that  satisfy  stronger  invariance 
requirements  such  as  invariance  under  linear  or  affine  transformations.  In  this  thesis,  we 
take  a  different  approach  and  instead  study  the  important  sub-problem  of  testing  function 
isomorphism.  These  results  are  presented  in  Chapters  8-10. 

The  third  and  final  main  open  question  that  we  examine  in  this  thesis  is:  what  connec¬ 
tions  can  we  establish  between  property  testing  and  other  areas  of  computer  science?  In 
Chapters  11  and  12,  we  describe  new  connections  to  communication  complexity  and  to 
learning  theory. 

The  following  subsections  describe  the  results  in  this  thesis  in  more  detail. 


1.3.1  Part  I:  Exact  query  complexity 

We  begin  by  examining  the  query  complexity  for  some  of  the  most  fundamental  properties 
of  boolean  functions.  Specifically,  we  study  juntas  and  k-linearity.  These  two  classes  of 
functions  play  a  particularly  fundamental  role  in  property  testing  and,  as  we  will  discuss 
in  the  later  chapters,  new  bounds  on  the  query  complexity  for  these  problems  improve  the 
bounds  for  the  query  complexity  of  a  number  of  other  properties. 

In  Chapter  5,  we  begin  by  studying  the  problem  of  testing  k -juntas — or,  in  other  words, 
of  testing  whether  a  boolean  function  has  at  most  k  relevant  variables.  While  it  was  known 
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previously  that  it  is  possible  to  test  k -juntas  with  a  number  of  queries  that  depends  only  on 
k  and  e  [52],  the  exact  dependence  on  k  was  not  known.  We  present  a  new  algorithm  that 
settles  this  question  up  to  logarithmic  factors. 

Besides  the  improved  query  complexity,  the  main  contributions  of  the  new  algorithm 
are  twofold.  First,  the  algorithm  itself  is  conceptually  very  simple,  and  closely  parallels  the 
optimal  algorithm  for  learning  juntas  with  membership  queries  [29].  Second,  the  analysis 
of  the  algorithm  uses  results  on  intersecting  family,  illustrating  a  new  and  potentially  useful 
connection  between  the  analysis  of  algorithms  and  extremal  combinatorics. 

In  Chapter  6,  we  examine  the  problem  of  testing  partially  symmetric  functions:  that 
is,  of  testing  whether  there  is  a  set  S  of  at  most  k  variables  such  that  the  input  function  is 
invariant  to  any  relabeling  of  the  variables  outside  S.  (See  Chapter  6  for  a  more  detailed 
introduction  to  partially  symmetric  functions.)  Every  junta  is  also  partially  symmetric,  but 
the  converse  is  not  true;  the  set  of  partially  symmetric  functions  is  much  larger.  The  main 
result  of  Chapter  6  is  that  the  junta  tester  from  the  previous  chapter  can  be  extended  to  test 
partial  symmetry  with  roughly  the  same  query  complexity. 

The  motivation  for  the  study  of  partially  symmetric  functions  was  to  better  understand 
how  the  invariance  structure  of  juntas  is  responsible  for  the  fact  that  the  property  is  ef¬ 
ficiently  testable.  An  important  contribution  of  this  chapter,  required  for  the  analysis  of 
the  tester,  is  the  introduction  and  analysis  of  a  new  measure  of  influence,  which  we  call 
symmetric  influence.  This  notion  may  be  of  independent  interest. 

In  Chapter  7,  we  turn  to  the  problem  of  proving  lower  bounds  for  the  query  complex¬ 
ity  of  properties  of  boolean  functions.  Specifically,  we  examine  the  problem  of  testing 
A  - linearity — or  testing  whether  the  function  returns  the  parity  of  exactly  k  of  its  variables. 
We  show  that  roughly  ruin  (A',  n  —  k}  queries  are  required  for  this  task.  This  confirms  a 
conjecture  of  Goldreich  [57]  and  also  gives  new  lower  bounds  for  a  number  of  other  prop¬ 
erties,  including  juntas,  functions  of  low  Fourier  degree,  sparse  polynomials,  and  functions 
computable  by  small  decision  trees. 

The  result  in  Chapter  7  is  obtained  by  reducing  the  problem  of  testing  A; -linearity  to  a 
purely  geometrical  problem  on  the  hypercube.  One  interesting  aspect  of  this  result  is  that, 
unlike  most  results  in  property  testing,  the  lower  bound  we  obtain  is  not  just  asymptotically 
linear  in  k — it  is  equal  to  k  —  o{k). 

1.3.2  Part  II:  Testing  function  isomorphism 

The  second  part  of  the  thesis  examines  property  testing  from  a  more  qualitative  point 
of  view.  Instead  of  seeking  to  determine  the  exact  query  complexity  for  some  specific 
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properties  of  boolean  functions,  the  motivating  question  behind  the  research  presented 
here  is:  can  we  characterize  the  set  of  properties  of  boolean  functions  that  can  be  tested 
with  a  constant  number  of  queries?2 

An  important  sub-goal  of  this  research  direction  is  to  characterize  the  set  of  functions 
for  which  we  can  test  isomorphism — that  is,  for  which  we  can  test  identity  up  to  relabeling 
of  the  input  variables — with  a  constant  number  of  queries.  This  problem  was  first  raised 
by  Fischer  et  al.  [52]  but  until  recently  this  problem  was  largely  open,  with  only  three 
exceptions:  all  symmetric  functions  and  all  juntas  on  a  constant  number  of  variables  were 
known  to  be  isomorphism-testable  with  a  constant  number  of  queries,  and  it  was  known 
that  testing  isomorphism  to  parity  functions  on  cu(l)  <  k  <  o(yfn)  variables  required  a 
super-constant  number  of  queries. 

In  Chapter  8,  we  begin  by  unifying  and  extending  the  results  concerning  the  isomorphism- 
testability  of  symmetric  functions  and  juntas.  As  we  have  seen  above,  the  set  of  partially 
symmetric  functions  encompasses  the  set  of  juntas.  Clearly,  this  set  also  includes  the  set 
of  symmetric  functions.  (It  also  includes  many  other  functions  as  well.)  In  this  chapter, 
we  show  that  we  can  test  isomorphism  to  any  partially  symmetric  function  with  a  constant 
number  of  queries. 

In  Chapter  9,  we  show  that  the  set  of  functions  for  which  we  cannot  test  isomorphism 
with  a  constant  number  of  queries  extends  far  beyond  the  set  of  A; -linear  functions  for  some 
values  of  k.  In  fact,  we  show  that  for  almost  all  boolean  functions,  testing  isomorphism 
to  those  functions  requires  Q(n)  queries.  Since  O(nlogn)  queries  are  sufficient  to  test 
isomorphism  to  any  function,  the  result  of  this  section  shows  that  the  universal  upper 
bound  is  nearly  tight  for  almost  all  functions. 

The  result  in  Chapter  9  is  non-constructive.  In  Chapter  10,  we  give  a  large  explicit  class 
of  functions  for  which  testing  isomorphism  requires  a  super-constant  number  of  queries. 
Specifically,  we  answer  a  question  of  Fischer  et  al.  [52]  by  showing,  roughly,  that  testing 
isomorphism  to  A; -juntas  that  “strongly  depend”  on  nearly  all  of  the  relevant  variables 
requires  at  least  log  k  queries. 


1.3.3  Part  III:  Connections 

In  the  third  part  of  the  thesis,  we  explore  some  connections  between  property  testing  and 
other  areas  of  computer  science. 

Chapter  1 1  presents  a  connection  between  property  testing  and  communication  com- 
2I.e.,  with  a  number  of  queries  that  does  not  depend  on  the  size  of  the  domain  of  the  functions. 
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plexity.  Communication  complexity  has  been  remarkably  successful  in  proving  lower 
bounds  in  many  areas  of  computer  science.  The  research  presented  in  this  chapter  was 
motivated  by  a  simple  question:  might  communication  complexity  be  useful  for  proving 
lower  bounds  in  property  testing  as  well?  The  results  in  this  chapter  show  that  the  answer 
to  this  question  is  an  emphatic  “yes”.  We  present  the  basic  reduction  that  connects  the 
two  areas  in  the  chapter  and  show  how  it  can  be  used  to  prove  lower  bounds  for  testing 
A- linearity,  monotonicity,  and  computability  by  small  decision  trees. 

In  Chapter  12,  we  examine  the  connection  between  property  testing  and  learning  the¬ 
ory.  It  has  long  been  known  that  the  two  areas  are  closely  connected.  (We  review  some 
of  the  details  of  this  connection  in  Section  3.2.)  This  connection,  however,  is  largely  con¬ 
cerned  with  the  connection  between  property  testing  and  the  membership  query  learning 
model — where  the  learner  can  query  the  target  function  on  any  input  of  its  choosing. 

In  many  applications  of  learning  theory,  the  membership  query  model  is  not  realistic. 
An  alternative  model,  called  active  learning ,  where  the  learner  can  query  any  input  of  its 
choosing  among  those  that  exist  in  the  real-world  is  often  more  realistic.  As  a  result,  this 
model  is  of  great  interest  to  the  learning  community.  In  Chapter  12,  we  introduce  a  new 
model  of  property  testing,  which  we  call  active  property  testing,  with  the  same  relation 
to  active  learning  as  the  standard  property  testing  model  has  to  membership  query  learn¬ 
ing.  We  describe  the  active  testing  model  and  present  results  concerning  the  testability  of 
dictator  functions  and  halfspaces  in  this  model. 
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Chapter  2 

Boolean  Functions 


The  main  object  of  study  in  this  thesis  is  the  boolean  function.  The  goal  of  this  section  is 
to  introduce  the  basic  definitions  and  some  fundamental  tools  we  will  use  throughout  the 
rest  of  the  thesis.  We  will  also  examine  two  classes  of  boolean  functions  that  will  appear 
repeatedly  in  later  chapters:  juntas  and  parity  functions. 

Readers  familiar  with  boolean  functions  are  encouraged  to  quickly  skim  this  chapter 
to  become  acquainted  with  the  notation.  Conversely,  for  readers  who  would  appreciate  a 
more  thorough  introduction  to  boolean  functions  and  the  tools  used  to  analyze  them,  we 
recommend  the  surveys  [80,  45]  and  the  book  [81]. 


2.1  Boolean  hypercube 

Before  examining  boolean  functions,  let  us  first  take  a  moment  to  establish  some  basic 
facts  regarding  their  domains:  the  boolean  hypercube  {0,  l}n. 

Given  an  element  x  £  {0,  l}n,  we  write  xi, . . . ,  xn  to  refer  to  the  individual  coordi¬ 
nates  of  x.  When  discussing  boolean  functions,  we  refer  to  the  set  [n]  :=  {1, . . . ,  n}  as  the 
set  of  variables  of  /.  In  this  terminology,  the  value  of  the  zth  variable  in  x  is  xt. 

The  elements  e1, ...  ,en  £  {0,  l}n  are  defined  by  setting  the  zth  coordinate  of  e*  to  1 
and  all  other  coordinates  to  0.  More  generally,  for  every  set  S  C  [n] ,  the  characteristic 
vector  for  S,  denoted  by  es  £  {0,  l}n,  is  defined  by  setting  ef  =  1  for  every  i  £  S  and 
ef  =  0  for  every  i  £  [n]  \  S.  We  also  use  0  and  1  to  denote  the  all-zero  and  all-one  vectors. 
(Note  that  e®  =  0  and  =  1.) 

Given  two  bits  a,b  £  {0,l}n,  there  are  three  basic  binary  operators  that  we  use: 
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AND  (A),  OR  (V),  and  XOR  (©).  They  are  all  defined  in  the  standard  way. 


a  V  b 

a  A  b 

a  ©  b 


if  a  =  1  or  b 
otherwise. 

if  a  =  b  =  1 
otherwise. 

if  a  ^  b 
otherwise. 


1 


We  define  the  same  operators  to  apply  bit-wise  for  the  elements  in  {0,  l}n.  E.g.,  three 
elements  x,y,z  G  {0, 1}"  satisfy  z  =  x  ©  y  iff  Zi  —  Xi  ©  t/*  for  each  i  e  [n]. 

The  set  {0,  l}n  with  the  bit-wise  XOR  operator  ©  forms  a  group.  This  group,  com¬ 
bined  with  the  the  trivial  scalar  multiplication  operator  defined  by  1  •  x  =  x  and  0  •  x  —  0, 
forms  a  vector  space  over  the  field  F2.  With  some  abuse  of  notation,  we  use  {0,  l}n  to 
refer  to  the  set,  the  group,  and  the  vector  space,  depending  on  the  context. 

We  can  define  the  inner  product  of  elements  in  {0,  l}n  by  setting 


n 

(x,  y)  =  ^2  Xi  A  Vi 

i= 1 


for  every  x,  y  G  {0,  l  }n.  The  norm  obtained  with  this  inner  product  is 


n 

x||  =  (x,x)  = 

i=  1 


which  is  also  known  as  the  Hamming  weight  of  x. 

For  every  S  C  [n],  we  define  the  projection  operator  Ps  :  {0,  l}n  — >  {0,  l}n  by 
setting  Ps  (x)  —  x  A  es .  We  will  use  the  shorthand  notation  xs  to  represent  the  projection 
Ps(x).  We  will  often  use  the  projection  operator  to  combine  two  or  more  boolean  vectors. 
For  example,  the  element  z  =  xs  V  y§  is  the  “hybrid”  vector  that  is  identical  to  x  on  all 
coordinates  in  S  and  is  identical  to  y  on  all  the  other  coordinates. 

Remark  2.1.  In  some  settings,  it  is  more  convenient  to  consider  the  boolean  hypercube  as 
the  vector  space  over  {—1,  l}n  or  as  Fj.  The  three  representations  of  the  hypercube  are 
isomorphic.  In  the  following,  we  will  mostly  use  the  {0,  l}n  representation. 
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2.2  Boolean  functions 


Most  of  the  boolean  functions  we  consider  in  this  thesis  are  of  the  form  /  :  {0,  l}n  — * 
{0, 1}.  In  order  to  analyze  these  functions,  it  is  convenient  to  consider  the  larger  class  of 
functions  mapping  the  boolean  hypercube  (0,  l}n  to  the  set  of  real  numbers  M. 

The  set  of  boolean  functions  {0,  l}n  — >  M,  with  the  standard  notions  of  function  ad¬ 
dition  and  scalar  multiplication  of  functions,  forms  a  vector  space  over  M.  We  can  define 
the  inner  product  of  two  functions  /,  g  :  {0,  l}n  — >  M  in  this  vector  space  by  setting 

{/’s)  =  ,6(V/w'sW1 

where  the  expectation  is  over  the  uniform  distribution  on  {0,  l}n.  The  inner  product  can 
be  used  to  define  the  ( L2)  norm  of  /  as 

ii/ih  =  dUl)- 

A  fundamental  tool  in  the  analysis  of  boolean  functions  is  the  Cauchy-Schwarz  in¬ 
equality,  which  bounds  the  inner  product  of  two  functions  by  the  product  of  their  L2  norms. 

Theorem  2.2  (Cauchy-Schwarz  Inequality).  For  any  functions  f,  g  :  (0,  l}n  — >  M, 

(f,9)<\\fh-\\9h. 

Another  useful  tool  in  the  analysis  of  boolean  functions  is  Fourier  analysis.  We  intro¬ 
duce  this  tool  in  the  next  section. 


2.3  Fourier  Analysis 

To  describe  the  Fourier  representation  of  boolean  functions,  we  must  first  introduce  the 
notion  of  characters  for  the  boolean  hypercube. 

Definition  2.3.  A  character  of  the  vector  space  {0,  l}n  is  a  function  y  :  (0, 1}"  — > 
{  —  1, 1}  that  satisfies 

x{x®y)  =  x(x)  ■  x(y) 

for  every  x,  y  G  {0,  l}n. 

In  other  words,  a  character  y  is  a  group  homomorphism  between  (0, 1}”  and  {  —  1, 1}. 
The  following  proposition  identifies  the  set  of  characters  for  (0,  l}n. 
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Proposition  2.4.  For  every  a  G  {0,  l}n,  the  function  \a  '■  {0,  l}n  — *  {  —  1, 1}  defined  by 

Xc(x)  = 

is  a  character  of  { 0,  l}n. 

Proof  For  any  x,  y  G  {0,  l}n, 

X„(*  +  »)  =  =  (_!)<«, *>+<«>  =  (_1)M  ■  (-1)<«>  =  Xa(X)  .  Xa(y).  □ 

Fact  2.5.  For  any  a  G  {0,  l}n,  we  have  Ex [yQ  0*0]  =  \  l  ° 

I  0  otherwise. 

We  are  now  ready  to  introduce  the  Fourier  transform  and  Fourier  representation  of 
boolean  functions. 

Definition  2.6  (Fourier  transform).  The  Fourier  transform  of  the  function  /  :  (0,  l}n  — >  E 
is  the  function  /  :  (0,  l}n  — >  M  defined  by  f(a)  =  (/,  Xa)- 

Definition  2.7  (Fourier  representation).  The  Fourier  representation  (or  Fourier  decompo¬ 
sition)  of  the  function  /  :{ 0,  l}n  — >■  M  is 

f(x)  =  ha)Xa{x)- 

ae{0,l}" 

A  fundamental  property  of  the  Fourier  transform  is  that  it  preserves  the  squared  L2 
norm  of  the  function. 

Theorem  2.8  (Parseval’s  identity).  For  any  function  f  :  (0,  l}n  — >•  E, 

E[/M2]  =  ll/ll!  =  V  f(a)\ 

X  ■  » 

ae{0,l}n 


A  useful  consequence  of  Parseval’s  identity  is  that  the  distance  between  two  boolean 
functions  has  a  nice  representation  in  terms  of  the  Fourier  transform  of  the  functions. 

Lemma  2.9.  For  any  boolean  functions  f,  g  :  (0,  l}n  — >  (0, 1}, 


e(M)"1/W^9W1 


Xi  (/(a)  -9(a))2. 
ae{0,l}n 
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Proof.  Since  /  and  g  are  {0,  l}-valued,  Prx[/(x)  f  g(x)}  =  E X[(f(x)  -  g(x))2}.  By 
Parseval’s  identity,  this  means  that 

Pr[f(x)  ^  g(x)}  =  E[(f(x)  -  g(x))2]=  f  -  g(a)2 . 

X  X  1  -* 

aS{0,l}" 

The  proof  is  completed  by  noting  that  for  any  a  G  {0,  l}n, 

/  -  ff(a)  =  (f~g,at)  =  (/,  a)  -  (g,  a)  =  f(a )  -  g(a).  □ 

In  Chapter  7,  we  shall  also  make  use  of  a  basic  fact  regarding  the  Fourier  transform  of 
the  pushforward  of  boolean  functions. 

Definition  2.10  (Pushforward).  Fix  a  subspace  W  C  {0, 1}".  The  pushforward  of  the 
function  /  :  {0,  l}n  — >  R.  by  the  linear  function  g  :  {0,  l}n  — *  W  is  the  function  g*(f)  : 
W  — y  M  defined  by 

(».(/))(*):=  ^  V  f(y)=  E  [I[9(!,)=x].  /(!,)]■ 

2  /L-fi  ^  ye{o,i}™ 

Fact  2.11.  Fix  W  C  {0,  l}n.  For  any  function  f  :  {0,  l}n  — >  M  and  any  linear  function 
g  :  {0,  l}n  — >  IF,  //ic  Fourier  transform  of  gt(f  )  satisfies 


2.4  Influence 

Another  important  tool  in  the  analysis  of  boolean  functions  is  the  notion  of  “influence” 
of  the  variables  in  the  function.  This  section  defines  the  notion  formally  and  proves  some 
basic  properties  of  influence.  As  a  first  step,  let  us  define  what  we  mean  by  “relevant”  and 
“irrelevant”  variables. 

Definition  2.12  (Relevant  variables).  The  variable  i  G  [n]  is  relevant  in  the  boolean  func¬ 
tion  /  :  {0,  l}n  — y  {0, 1}  if  there  exists  an  element  x  G  {0,  l}n  such  that  f(x)  f(x(Bel). 
If  no  such  element  exists,  we  say  that  i  is  irrelevant  in  /. 

The  definition  of  relevance  extends  in  a  natural  way  to  sets  of  variables.  We  can  do  so 
by  defining  a  set  S  C  [n]  to  be  relevant  in  the  function  /  if  some  variable  /’  G  S  is  relevant 
in  S.  Equivalently,  we  can  define  relevant  sets  directly  as  follows. 
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Definition  2.13  (Relevant  sets  of  variables).  The  set  of  variables  S  C  [n]  is  relevant  in 
/  :  (0,  1}"  — >  (0, 1}  if  there  exists  two  elements  x,  y  €  (0,  l}n  such  that  Xg  =  yg  and 

f(x)  f(y)- 

We  can  refine  the  notion  of  relevance  to  measure  to  take  into  account  the  fraction  of 
elements  x  £  (0,  1}"  for  which  f(x)  f  f(x®el).  The  result  is  our  definition  of  influence. 

Definition  2.14  (Influence  of  a  variable).  The  influence  of  the  variable  i  £  [n]  in  the 
boolean  function  /  :  (0,  l}n  — >  (0, 1}  is 

!nf/(i)  =  |  Pr [f(x)  f(x  ©  e*)] 

where  the  probability  is  taken  over  the  uniform  distribution  of  x  in  (0, 1}". 

We  can  also  introduce  a  similar  definition  to  measure  the  influence  of  sets  of  variables. 

Definition  2.15  (Influence  of  a  set  of  variables).  The  influence  of  the  set  S  C  [n]  of 
variables  in  the  boolean  function  /  :  (0,  1}"  — *  (0, 1}  is 

Inf f(S)  =  Pr [f(x)  f(x§  V  ys)] 

x,y 

where  the  probability  is  taken  over  the  uniform  distribution  of  x  and  y  in  (0,  l}n. 

The  reader  may  find  it  slightly  puzzling  that  the  definition  of  influence  of  variables 
includes  a  factor  of  This  is  done  so  that  the  influence  of  a  variable  is  equal  to  the 
influence  of  a  singleton  set  containing  that  variable. 

Fact  2.16.  For  any  function  f  :  (0,  l}n  — *  {0,1}  and  any  variable  i  £  [n],  we  have 
Inf/C0  =  Inf/({i». 

An  immediate  observation  that  follows  from  our  definitions  of  relevance  and  influence 
is  that  the  variables  that  are  relevant  in  a  function  are  exactly  the  variables  with  non-zero 
influence  in  that  function. 

Fact  2.17.  The  variable  i  £  [n\  is  relevant  in  f  :  {0,1}"  — »  {0,1}  iff  Iiif/(/’)  >  0. 
Similarly,  the  set  S  C  [n]  of  variables  is  relevant  in  f  iff  luff  (S)  >  0. 

A  less  obvious — but  much  more  useful — observation  is  that  the  notions  of  influence 
have  a  natural  representation  in  terms  of  the  Fourier  transformation  of  a  boolean  function. 
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Lemma  2.18.  For  every  function  f  :  {0,l}n  — *  {0,1}  and  every  variable  i  e  [n],  the 
influence  ofi  in  f  satisfies 

Inf /(*)  =  2  /(a)2' 

ae{0,l}n  :«i= 1 

More  generally,  for  every  set  S  C  [n],  the  influence  of  S  in  f  satisfies 

Inf;(S)  =  2  ^  /(a)2. 

ctG{0,l}"  :  ||as||>0 


Proof  By  Fact  2.16,  it  suffices  to  prove  the  latter  identity. 

Since  /  is  {0,  l}-valued,  we  have 

Pr [f(x)  f  f(xs  V  1/5)]  =  E  [(f(x)  -  f{xg  V  2/<?))2] 

x,  y  x,y 

—  2  E[/(a;)2]  —  2  E  [f(x)f(x§  V  ys)]- 

x  x,y 

The  Fourier  representation  of  /  and  linearity  of  expectation  yield 


E  [f{x)f{x§Vys)} 

x,y 


V]  f(<x)f(P)  E  [Xa(x)Xft(xS  V  ?/<?)]. 

•  *  fy 

a,/3e{0,l}n 


By  the  identity  Xp{xs  v  Vs)  =  Xps(x)xps(y)’  we  can  rewrite  the  right-most  term  as 
E x,y[Xa(x)xp(xs  V  !/<?)]  =  Ex[xa(Bps(x)]  Ey[xps (j/)].  Applying  Fact  2.5,  this  means  that 
E x,y[Xa(x)xp(xsV  Vs)]  takes  the  value  1  when  a  =  fig  and  fig  =  0  (or,  equivalently,  when 
ag  =  0  and  j3  =  a)  and  it  takes  the  value  0  otherwise.  This  observation  and  Parseval’s 
identity  imply  that 

Pr [f(x)  f  f(xg  V  ys)]  =  2  E [f(x)2]  -  2  E  [f(x)f(xs  V  ys)] 

x,y  x  x,y 

=  2j2f(a)2~2  Y  /(a)2  =2  £  /(a)2.  □ 

a  «:as=0  a:||as||>0 


The  monotonicity  and  subadditivity  properties  of  influence  follow  directly  from  the 
lemma. 

Theorem  2.19  (Monotonicity  and  subadditivity  of  influence).  For  any  boolean  function 
f  :  {0,  l}n  — >  {0, 1}  and  any  two  sets  S',  T  C  [n], 

Inf. f(S)  <  Inf f (S'  UT)<  lnif(S)  +  Inf^T). 
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Proof.  Any  a  G  {0,  l}n  that  satisfies  ||as||  >  0  also  satisfies  ||o;sut||  >  0.  Therefore,  by 
Lemma  2.18, 

Inf/(S)  =  2  Y  Ha)2  <2  Y  /(«)2  =  Inf/(S  U  T). 

a. :  \\(%s II >0  ex- :  H^sutII^O 

Similarly,  any  a  G  {0,  l}n  that  satisfies  ||asuT||  >  0  must  satisfy  at  least  one  of  the  two 
inequalities  ||ct;s||  >  0  and  ||aT||  >  0.  So 

Inf/(SUT)=2  Y  ha)2 

Q  ■  I|osut||>0 

<2  V  f{a)2  +  2  V  f(a)2 

a:||as||>0  ct :  || CKr II >0 

=  Inf/(S,)  +  Inf/(T).  □ 


2.5  Juntas 


Juntas  figure  prominently  in  the  rest  of  this  thesis.  In  this  section,  we  define  the  term  more 
formally  and  provide  a  basic  fact  about  functions  that  are  “far”  from  being  juntas  for  later 
use.  A  more  complete  introduction  to  juntas  and  their  role  in  property  testing  is  deferred 
to  Chapter  5. 

Definition  2.20  (Junta).  Fix  0  <  k  <  n.  The  function  /  :  (0,  l}n  — *  (0, 1}  is  a  k-junta  iff 
it  contains  at  most  k  relevant  variables. 


Note  that  when  we  say  that  /  is  a  junta  (without  specifying  k),  we  mean  that  /  is  a 
A: -junta  for  some  k  =  0(1).  (I.e.,  for  some  k  independent  of  n.) 

A  simple  characterization  of  the  influence  of  variables  in  functions  that  are  far  from 
juntas  is  as  follows. 

Lemma  2.21.  Fix  0  <  k  <  n.  Let  f  :  (0,  l}n  — >  (0, 1}  be  e-far  from  all  functions  that 
are  k-juntas.  Then  every  set  J  C  [n]  of  size  \J\  <  k  satisfies  Infy(J)  >  e. 


Proof.  Fix  any  set  J  C  [n]  of  size  \J\  <  k.  Define  h  :  (0,  l}n  — »  (0, 1}  to  be  the  function 
obtained  by  setting 


h(x) 


1  if  Ez[/(xj  V  z-j)\  >  \ 

0  otherwise 
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for  every  x  G  {0,  l}n.  The  variables  that  are  relevant  in  h  are  contained  in  J,  so  h  is  a 
A; -junta.  By  the  assumption  in  the  lemma  statement,  we  must  therefore  have  that 

Pr[/(x)  7^  h(x)]  >  e. 

X 

For  each  w  G  {0,  l}n,  define 

Pw  ■=  Pr[/ (x)  ^  h(x)  |  xj  =  wj}. 

By  the  definition  of  h,  for  every  w  we  have  pw  <\.  We  also  have  that 

Pr [f{x)  7^  h(x)\  =  E  \px  ] 

X  X 

and  that 

Inf  / (J)  =  Pr [f(x)  ±  f(xj  V  zj)]  =  2  E(pXj(l  -  pXj)\. 

X  X 

Combining  the  above  results,  we  get 

Inf  f  (J)  >  2  E [\pxj\  =  Pr[/  (x)  ^  h(x)]  >  e.  □ 

X  X 


2.6  Parity  functions 

Another  class  of  functions  that  appears  prominently  throughout  this  thesis — and,  indeed, 
that  often  appears  in  almost  any  study  of  boolean  functions — is  the  set  of  parity  functions. 

Definition  2.22  (Parity).  Fix  S  C  [n] .  The  parity  function  corresponding  to  S  is  the 
function  Paritys  :  {0,l}n  — »  {0,1}  defined  by 


Parity5(r)  =  0  xt. 

ieS 

The  parity  functions  are  linear  functions  since  they  satisfy  the  identity 

Paritys  (x  ©  y)  —  Paritys(x)  ©  Parity5(y) 

for  every  x,  y  G  {0,  l}n.  When  \S\  —  k,  we  also  say  that  the  function  Paritys  is  k-linear. 
The  parity  functions  have  a  simple  Fourier  representation. 
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Proposition  2.23.  Fix  S  C  [n],  The  Fourier  coefficients  of  the  parity  function  Parity 5  : 
{0,  l}n  — >  {0, 1}  are  defined  by 

=  P  '/“  =  0or“  =  eS 

|()  otherwise. 

We  can  use  the  Fourier  decomposition  of  parity  functions  to  establish  the  following 
useful  facts. 

Proposition  2.24.  Fix  0  <  k  <  n  and  let  S  C  [n]  be  a  set  of  size  .S'  >  k.  Then  the  parity 
function  Paritys  :  {0,  l}n  — >  {0, 1}  is 

i.  I -far  from  Parity Tfor  every  T  f  S, 

ii.  I -far  from  every  k-linear  function, 

iii.  I -far  from  every  k-junta,  and 

iv.  I -far  from  all  functions  of  Fourier  degree  at  most  k. 

Proof  By  Lemma  2.9  and  Proposition  2.23, 

Pr[Parity5(x)  f  ParityT(x)]  =  ^  (Paritys(a)  -  ParityT(o;))2  =  2(^)2  = 

a 

This  completes  the  proof  of  i.  It  also  immediately  implies  ii  since  A- linear  functions  are 
necessarily  parity  functions  on  some  set  T  S.  To  prove  iii  and  iv,  it  suffices  to  prove  the 
latter  statement,  since  A; -juntas  have  Fourier  degree  at  most  k. 

Let  /  :  {0,  l}n  — >  {0, 1}  be  a  function  of  Fourier  degree  at  most  k.  Since  \S\  >  k,  we 
can  again  apply  Lemma  2.9  and  Proposition  2.23  to  obtain 

Pr[/0)  f  Parity5(»]  =  (/(a)-Parity5(a))2  =  (/( 0)-|)2+ 

a  0<||o'||<fc 

By  Parseval’s  identity  and  the  Fourier  degree  of  /, 

£  /(a)2  =  £  Ha?  =  ll/ll!  -  Ho?. 

0<||ck||<&  ||  cull  >0 

The  function  /  is  {0,  l}-valued,  so  ||/|||  =  E X[f(x)2]  =  E X[f(x)]  =  /( 0).  As  a  result, 

Pr [f(x)  f  Paritys(x)]  =  (/( 0)  -  \  f  +  (/( 0)  -  /( 0)2)  +  \  =  □ 
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Chapter  3 

Property  Testing 


This  section  presents  a  brief  introduction  to  property  testing.  We  introduce  all  the  neces¬ 
sary  definitions,  then  explore  two  fundamental  topics  that  we  will  use  in  later  sections:  the 
connection  between  property  testing  and  learning  theory,  and  a  main  lemma  for  proving 
lower  bounds  on  the  query  complexity  for  testing  properties  via  Yao’s  Minimax  Lemma. 
For  a  more  thorough  introduction  to  property  testing,  we  recommend  the  surveys  [87,  89] 
and  the  collection  [58]. 


3.1  Definitions 

Definition  3.1  (Property).  A  property  of  boolean  functions  is  a  subset  of  all  boolean  func¬ 
tions. 

When  /  :  (0,  l}n  — *  (0, 1}  is  in  V,  we  say  that  /  has  property  V.  We  say  that  /  is 
“far”  from  V  when  every  function  in  V  disagrees  with  /  on  a  large  fraction  of  the  inputs 
in  (0,  l}n.  To  make  this  notion  precise,  we  introduce  measures  of  distance  for  functions 
and  for  properties  of  boolean  functions. 

Definition  3.2  (Distance  between  functions).  Fix  a  distribution  V  over  (0,  l}n.  The  dis¬ 
tance  between  the  two  functions  /,  g  :  (0,  l}n  — »  (0, 1}  under  D  is 

dist v(f,g)  ■=  Pr  [f(x)  t £g(x)]. 

When  V  is  the  uniform  distribution  over  (0,  l}n,  we  omit  the  subscript  and  write  simply 

dist  (/",  g). 
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Definition  3.3  (Distance  between  properties).  Fix  a  distribution  V  over  (0,  l}n.  The  dis¬ 
tance  between  two  properties  V,Q  C  (0,  l}n  — >  {0, 1}  is 

dist  v(V,Q):=  min  dist  v(f,g)- 
f&P,g&Q 

Once  again,  we  omit  the  subscript  when  V  is  the  uniform  distribution  over  (0, 1}". 

Our  main  measure  of  interest  is  the  distance  between  a  function  and  a  property.  We 
define  this  measure  using  the  distance  between  properties. 

Definition  3.4  (Distance  between  functions  and  properties).  Fix  a  distribution  V  over 
{0,  l}n.  The  distance  between  the  function  /  :  {0,l}n  — >  {0,1}  and  the  property 
V  C  {0,l}n  ->■  {0,1}  is 


dist v(f,V)  :=  dist v({f},V). 

As  above,  we  omit  the  subscript  when  V  is  the  uniform  distribution  over  {0,  l}n. 

When  distx)(/,  V)  >  e,  we  say  that  /  is  e-far  from  V  (under  the  distribution  V).  When 
dist x>{f,  V)  <  e,  then  /  is  e-close  to  V. 

We  are  now  ready  to  present  the  notion  of  property  testers,  as  originally  defined  by 
Rubinfeld  and  Sudan  [90]. 

Definition  3.5  (Property  tester).  Fix  a  property  V  C  {0,  l}n  — >  {0, 1}  and  a  distribution 
V  over  {0,  l}n.  A  q-query  e-tester  for  V  over  V  is  a  randomized  algorithm  that  queries  an 
unknown  function  /  :  {0,  l}n  — >  {0, 1}  on  at  most  q  inputs  and 

(i)  Accepts  with  probability  at  least  |  when  /  has  property  V;  and 

(ii)  Rejects  with  probability  at  least  |  when  /  is  e-far  from  V. 

Remark  3.6.  Note  that  the  tester  is  free  to  accept  or  reject  /  when  0  <  dist £>(/,  V)  <  e. 

Remark  3.7.  The  choice  of  |  for  the  acceptance  and  rejection  probabilities  is  somewhat 
arbitrary.  More  generally,  we  could  require  those  probabilities  to  be  1  —  5  for  any  0  <  S  < 
Standard  boosting  arguments  show  that  the  two  settings  are  essentially  equivalent,  so 
we  do  not  bother  with  the  more  general  definition  in  this  thesis. 

Definition  3.8  (One-sided  error).  A  g-query  e-tester  for  V  that  is  guaranteed  to  always 
accept  functions  with  property  V  (instead  of  only  accepting  them  with  probability  at  least 
|)  is  said  to  have  one-sided  error.  Otherwise,  V  has  two-sided  error. 
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Definition  3.9  (Non-adaptive  testers).  A  g-query  e-tester  for  V  that  determines  all  of  its  g 
queries  before  observing  the  value  of  /  on  any  of  those  queries  is  non-adaptive .  Otherwise, 
the  tester  is  adaptive. 

A  well-known  folkloric  result  in  property  testing  states  that  the  most  query-efficient 
non-adaptive  tester  for  a  property  V  has  query  complexity  that  is  at  most  exponential  in 
the  query  complexity  of  the  best  adaptive  tester  for  V. 

Proposition  3.10.  Fix  a  property  V  C  {0, 1}"  — >  {0, 1}.  If  there  is  an  adaptive  q-query 
e-tester  for  V,  then  there  is  a  non-adaptive  2g-query  e-tester  for  V. 

Proof  Let  A  be  the  adaptive  g-query  e-tester  for  V.  The  behavior  of  A  is  determined  by 
a  decision  tree  of  depth  d.  By  fixing  the  randomness  in  advance,  we  can  simulate  A  with 
a  non-adaptive  algorithm  by  making  all  the  (at  most)  2d  queries  that  are  present  in  the  tree 
then  following  the  path  that  A  would  have  traversed  in  making  its  d  adaptive  queries.  □ 


3.2  Testing  and  Learning 

The  connection  between  property  testing  and  learning  theory  that  we  describe  in  this  sec¬ 
tion  was  first  established  by  Goldreich,  Goldwasser,  and  Ron  [59].  Before  describing  the 
connection  itself,  let  us  first  define  the  concept  of  a  “proper  learner”. 

Definition  3.11  (Proper  learner).  An  algorithm  A  with  query  access  to  a  boolean  function 
is  an  (e,5,q)  proper  learner  for  property  V  over  the  distribution  V  if  for  every  function 
/  :  {0,  l}n  — >  (0, 1}  in  V,  the  algorithm  A  queries  /  on  at  most  q  inputs  from  (0,  l}n 
then  outputs  a  function  h  :  (0,  l}n  — >  (0, 1}  in  V  such  that  with  probability  at  least  1  —  5, 
the  distance  between  /  and  h  is  bounded  by  dist ©(/,  h )  <  e. 

Remark  3.12.  We  emphasize  that  the  learner  is  free  to  output  any  function  h  E  V  when 

/  i  v. 

Our  definition  corresponds  to  the  concept  of  a  proper  learner  in  the  membership  query 
learning  model.  We  will  examine  other  learning  models  in  Chapter  12.  Also,  the  common 
terminology  in  the  learning  theory  community  refers  to  learners  over  classes  of  functions 
instead  of  over  properties;  the  two  definitions  are  identical. 

Proper  learners  can  be  used  to  test  properties,  as  the  following  lemma  shows. 

Lemma  3.13.  IfV  has  a  (|,  |,g)  proper  learner  over  V,  then  V  can  be  tested  with  q  + 
0(l/e)  queries. 
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Proof.  Let  A  be  a  proper  learner  over  V  for  V.  We  can  use  A  to  test  V  by  running 

the  following  simple  testing  algorithm: 

1.  Run  A  on  /  to  obtain  the  hypothesis  function  /?,  :{0,l}n— >  {0, 1}. 

2.  Query  /  on  s  =  0{  1/e)  samples  drawn  independently  at  random  from  V. 

3.  Let  d  be  the  fraction  of  inputs  chosen  in  the  last  step  on  which  /  and  h  disagree. 

4.  Accept  iff  d  <  "j. 

Let  us  now  examine  why  this  algorithm  is  a  valid  tester  for  V.  First,  when  /  e  V,  with 
probability  at  least  |,  A  returns  a  function  h  that  satisfies  dist £>(/,  h)  <  |.  Conversely, 
when  /  is  e-far  from  V,  then  no  matter  what  hypothesis  function  h  €  V  is  returned  by 
the  learner,  dist v(f,  h )  >  e.  Since  d  is  an  unbiased  estimator  for  dist £>(/,  h),  we  can  pick 
s  to  be  large  enough  to  guarantee  that  Pr[|rf  —  dist ©(/,  h)\  >  |]  <  |  and  the  resulting 
algorithm  is  a  valid  tester  for  V.  □ 


When  the  property  V  is  small,  there  is  a  simple  but  query-efficient  proper  learning 
algorithm  for  target  functions  in  V:  draw  0(\og\P\)  random  samples,  and  output  any 
hypothesis  in  V  that  is  consistent  with  the  function  on  all  the  samples.  As  a  result, 
Lemma  3.13  has  the  following  widely-applicable  corollary. 

Corollary  3.14.  Fix  V  C  {0,  l}n  — >  {0, 1}.  There  is  an  e-tester  for  V  that  requires  only 
0( log(|P|)/e)  queries. 

Proof.  By  Occam’s  Razor  (see,  e.g.  [73,  §2]),  there  is  an  (|,  ^,q)  proper  learner  for  V 
that  makes  q  =  0(log(|'P|)/e)  queries  to  the  target  function.1  The  corollary  then  follows 
immediately  from  Lemma  3.13.  □ 


3.3  Lower  Bounds  via  Yao’s  Minimax  Principle 


We  use  the  following  standard  property  testing  lemmas  in  the  following  sections. 

'in  fact,  as  discussed  above,  the  proper  learner  only  needs  to  receive  q  random  samples  drawn  from 

{0,1}". 
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Lemma  3.15.  LetV  yes  and  Vno  be  any  two  distributions  over  functions  {0, 1}"  — >  {0, 1}. 
If  for  every  set  X  C  {0, 1}"  of  size  |A"|  =  q  and  any  vector  r  G  {0,  l}9  we  have  that 

Pr  [/(A')  =  r]  -  Pr  [/(A)  =  r]  <  A  T\ 

t~Uyes  J~u  no 

then  any  algorithm  that  distinguishes  functions  drawn  from  Dyes  from  those  drawn  from 
Vno  with  probability  at  least  |  makes  at  least  q  +  1  queries. 

Proof  Define  D  to  be  the  distribution  obtained  by  drawing  a  function  from  Dyes  or  from 
Vno,  each  with  probability  1/2.  By  Yao’s  Minimax  Principle [100],  to  prove  the  lemma 
it  suffices  to  show  that  any  deterministic  testing  algorithm  needs  at  least  q  +  1  queries  to 
distinguish  functions  drawn  from  Vyes  or  from  T>no  with  probability  at  least  2/3. 

A  deterministic  testing  algorithm  can  be  described  by  a  decision  tree  with  a  query 
x  G  {0,  l}n  at  each  internal  node  and  a  decision  to  accept  or  reject  at  every  leaf.  Each 
boolean  function  /  defines  a  path  through  the  tree  according  to  the  value  of  f(x)  at  each 
internal  node. 

Consider  a  testing  algorithm  that  makes  at  most  q  queries.  Its  decision  tree  has  depth 
at  most  q  and  it  has  at  most  2q  leaves.  Let  us  call  a  leaf  £  negligible  if  the  probability  that 
a  function  /  ~  V  defines  a  path  that  terminates  at  £  is  at  most  y2~d.  The  total  probability 
that  /  ~  V  defines  a  path  to  a  negligible  leaf  is  at  most  y. 

Fix  £  to  be  some  non-negligible  leaf.  This  leaf  corresponds  to  a  set  X  C  (0,  l}n  of  q 
queries  and  a  vector  r  G  {0,  l  }''  of  responses;  a  function  /  defines  a  path  to  the  leaf  I  iff 
f{X)  =  r.  Since  £  is  non-negligible,  Pr [/( X)  =  r]  >  y2  d.  So  by  the  hypothesis  of 
the  lemma, 

Pr  [/(A)  =  r]  -  Pr  [/(A)  =  r]  <  A  2~d  <  |  Pr  [/(A)  =  r], 

J~L>  yes  J^^no 

Then  by  Bayes’  theorem 

Pr  [/  G  V  |  f(X)  =  r]  —  Pr  [/  e-far  from  V  \  /( X)  =  r\ 

Pr/~P„[/(A)  =  r]  -  Pr^p„.[/(A)  =  r]  1 

2Pr^p[/(A)=r]  6' 

Therefore,  the  probability  that  the  testing  algorithm  correctly  classifies  a  function  /  ~  D 
that  lands  at  a  non-negligible  leaf  £  is  less  than  y.  So  even  if  the  algorithm  correctly 
classifies  all  functions  that  land  in  negligible  leaves,  it  still  correctly  classifies  /  with 
probability  less  than  •  ^  +  ^  <  |,  so  it  is  not  a  valid  tester  for  V.  □ 
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Chapter  4 

Mathematical  Tools 


This  section  introduces  the  mathematical  tools  that  we  use  in  subsequent  chapters  to  an¬ 
alyze  property  testing  algorithms  and  to  establish  lower  bounds  on  the  query  complexity 
for  different  property  testing  tasks. 

Many  of  the  tools  that  we  use  are  taken  from  probability  theory  and  statistics.  We 
introduce  those  tools  first.  In  the  subsequent  sections,  we  then  introduce  the  results  from 
combinatorics  and  from  the  theory  of  orthogonal  polynomials  that  we  will  require  in  the 
following  chapters. 


4.1  Probability  theory 

We  adopt  most  of  the  standard  terminology  of  probability  theory  as  found,  for  example, 
in  [50,51], 

Many  of  our  in  later  chapter  require  us  to  show  that  two  related  distributions  are 
“close”.  We  formalize  this  concept  with  the  total  variation  distance  between  distributions. 

Definition  4.1.  Given  two  random  variables  X,  Y  defined  on  a  common  discrete  sample 
space  O,  the  total  variation  distance  between  X  and  Y  is 

dTV(X  Y)  =  i  £  |Pr[A  =  w]  -  Pr[V  =  w]| . 

The  most  basic  statistics  of  a  random  variable  x  drawn  from  some  distribution  D  are 
its  mean  E[x]  and  variance  Var[x]  =  E[(x  —  E[x])2].  When  x  —  (xi, . . . ,  xn)  is  a  random 
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variable  drawn  from  a  multivariate  distribution  V ,  we  define  its  mean  to  be  the  vector 
E[x]  =  (E[xi], . . . ,  E[.x„])  and  we  define  the  covariance  matrix  of  x  to  be  the  n  x  n 
matrix  Cov  a:]  whose  (i,j) th  entry  is  defined  by 

Co v[x\ij  =  E [(xi  -  E [xi])(xj  -  E[xj])]  =  E [xiXj\  -  E [xi\  E [xf. 


4.1.1  Hypergeometric  Distribution 

Consider  the  experiment  where  we  have  n  balls,  r  of  which  are  red,  and  we  draw  d  balls 
uniformly  at  random  without  replacement.  The  distribution  on  the  number  w  of  balls 
drawn  that  are  red  is  called  the  hypergeometric  distribution.  We  write  TLn,r,d  to  represent 
this  distribution. 

Intuitively,  when  n  &  n'  and  r  «  r',  the  distributions  TLn,r,d  and  Tln>y,d  should  be 
close.  The  following  lemma  confirms  and  formalizes  this  intuition. 

Lemma  4.2.  Let  n,  r,  n',  r',  d  be  non-negative  integers  with  d,  n'  <  pn  for  some  7  < 
Suppose  that  \r  —  ||  <  ty/n  and  \r'  —  y  |  <  t\fn'  hold  for  some  t  <  Then, 

d-T v(fdn,r,diTdn—n',r—r',cl)  5;  c(l  T  f)7  • 

holds  for  some  universal  constant  c. 

Proof.  Our  proof  uses  the  connection  between  hypergeometric  distribution  and  the  bino¬ 
mial  distribution,  which  we  denote  by  £>n  p  (for  n  experiments,  each  with  success  proba¬ 
bility  p).  Specifically,  we  use  the  following  two  lemmas. 

Lemma  4.3  (Example  1  in  [94]).  dTy {'Hnrdi  £><*  r)  <  -• 

Lemma  4.4  ([1]).  Let  0  <  p  <  1  and  0  <  5  <  1  —  p.  Then, 


d-T V\^n,pi  $ n,p+S )  — 


~2  (1  -  Wf 


provided  Tn,p(d)  <  1  where 


'Tn.p(fi) 


n  +  2 
2p(l  -  P)  ' 
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The  lemma  trivially  holds  when  k  —  0  so  from  now  on  we  assume  k  >  1.  By  the 
triangle  inequality, 

d  TV  (T~ln,r,di 'ktn—n1  ,r—r' ,d)  —  dx  ~viT~Ln,r,di^d,p) 

+  dxv ('Hn—n',r—rl,di  Bd,p')  +  d^\/{Bd,p,  Bd,p')  (4.1) 

where  p  =  -  and  //  =  . 

L  n  1  n—n' 

Now,  we  assume  p  <  p'  (the  other  case  can  be  treated  in  the  same  manner).  Let 
5  =  p'  —  p,  then 


run’  —  nm!  1 

o  =  — ; - —  < 


n(n  —  n')  n{n 

t(n\/n'  +  \/nn')  2t,/fin3^2  .  Ft  lx 

- 7 - t ^  <  4 1\ /  —  (from  7  <  -)  . 

n(n  —  n ')  (1  —  7 )n2  \  n  2 


Then,  77^  (<5)  in  Lemma  4.4  is 


TkM  <  At\l}l\J 


2p{l  -  p) 


<  4 t\l—  (from  k  >1) 
n 


7  /  *  +  2  <^(*  +  2)  (from  1  <4) 


n 


p{l-p) 


<  4v^6f7  (from  k  <  7 n ) 

Note  that,  from  the  assumption,  we  have  Tk,p(S)  <  From  Lemmas  4.3  and  4.4,  we  have 

(4  1)  <  k  1  k  1 

1  ;  -  n  +  n  +  2  (1  -  7*,(<S))2 

<  27  +  2^/e  •  4\/6f7  (from  rfciP(<5)  <  ^) 

<  c(l  +  f)7 

for  some  universal  constant  c.  □ 


4.1.2  Random  permutations 

A  permutation  tt  :  [n]  — >  [n]  is  a  bijection  on  [n] .  The  set  of  permutations  on  [n]  is  denoted 
by  Sn.  Fix  1  e  [n].  When  71(1)  =  i,  we  say  that  i  is  a  fixed  point  of  it.  We  write  ST  to 
represent  the  set  of  permutations  n  where  each  element  i  e  [n]  \  T  is  a  fixed  point  in  7r. 
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A  permutation  7 r  G  Sn  acts  on  vectors  in  {0, 1}"  in  the  natural  way:  for  x  G  {0, 1}", 
we  define 

TTX  .  .  .  ,  £*■(«))■ 

In  Chapter  6,  we  will  be  interested  in  bounding  the  total  variation  distance  between  per¬ 
muted  strings  under  slightly  different  choices  of  random  permutations.  The  following 
lemma  shows  that  this  distance  is  equal  to  the  total  variation  distance  between  related 
hypergeometric  distributions. 

Lemma  4.5.  Fix  J,  K  C  [n]  and  x  G  {0,  l}n.  Lei  'D7Tj  iI<x  and  'D-j1TkX  be  the  distributions 
on  TT./ij j(X  and  ttjTTkX,  respectively,  when  "juk-  "j-  "k  are  drawn  uniformly  at  random 
from  Sjuk,Sj,Sk,  respectively.  Then 

dTv(ArJuifx,  Dnj1TKX)  =  dTv(H\JuK\,\xjUK\,\K\J\i/H\K\,\xK\,\K\J\)- 

Proof  Since  both  distributions  DnjuKX  and  DWj7TkX  only  modify  coordinates  in  J  U  K, 
we  can  ignore  all  other  coordinates.  Moreover,  it  is  in  fact  suffices  to  look  only  at  the 
number  of  ones  in  the  coordinates  of  K  \  J  and  J  U  K,  which  completely  determines 
the  distributions.  Let  Dz  denote  the  uniform  distribution  over  all  elements  y  G  {0,  l}n 
such  that  \y\  =  \x\,  yJuK  =  xj ^  and  \ijk\j\  =  z.  (This  also  fixes  the  number  of 
ones  in  yj.)  Notice  that  this  distribution  is  well  defined  only  for  values  of  z  such  that 
max{0,  \xJuK\  -  |  J\}  <z<  min{|xjuA-|,  | K  \  J |}. 

Given  this  notation,  DnjuKX  can  be  looked  at  as  choosing  z  ~  'H\juk\,\Xjuk\,\k\j\  and 
returning  y  ~  Dz.  This  is  because  we  apply  a  random  permutation  over  all  elements  of 
JUA' ,  and  therefore  the  number  of  ones  inside  K\.J  is  indeed  distributed  like  z.  Moreover, 
the  order  inside  both  sets  K  \  J  and  J  is  uniform. 

The  distribution  D7rjnKX  can  be  looked  at  as  choosing  z  rs_/  'H\k\,\xk\,\k\j\  and  returning 
y  ~  I),,.  The  number  of  ones  in  K  \  J  is  determined  already  after  applying  nK.  It  is 
distributed  like  z  as  we  care  about  the  choice  of  | K  \  J  out  of  the  \K\  elements,  and  \xk\ 
of  them  are  ones  (and  their  order  is  uniform).  Later,  we  apply  a  random  permutation  nj 
over  all  other  relevant  coordinates,  so  the  order  of  elements  in  J  is  also  uniform. 

Since  the  distributions  Dz  are  disjoint  for  different  values  of  z,  this  implies  that  the 
distance  between  the  two  distributions  DnjuKX  and  D7TjnKX  depends  only  on  the  number 
of  ones  chosen  to  be  inside  K  \  J.  Therefore  we  have 

dxv(-D  Dttj-kkx)  dxv(^|  JUK\,\xjuk  U-^VI  j  'H-\K\,\xk\,\K\J\) 


as  required. 


□ 
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4.1.3  U-statistics 


Many  of  the  statistics  on  functions  /  :  {0,  l}n  — >■  {0, 1}  that  we  need  to  evaluate  in  this 
thesis  are  of  the  form  r  =  E X[f>(f(x))]  for  some  function  :  M  — *  M.  These  statistics 
can  be  estimated  efficiently  by  sampling  inputs  x1,. . . , xrn  independently  at  random  and 
evaluating  f  =  m~l  YT=  i  The  error  of  f  can  then  be  controlled  by  the  Chemoff 

bound  or  other  similar  concentration  inequalities. 

In  Chapter  12,  we  need  to  estimate  a  slightly  different  kind  of  statistic:  one  that  is  of 
the  form  r  =  E f(y ))]  for  some  :  M  x  M  — >■  M.  The  best  way  to  estimate  this 
statistic  is  by  using  U-statistics,  a  tool  first  introduced  by  Halmos  [64]  and  Hoeffding  [66]. 

Definition 4.6.  The  U -statistic  (of  order  2)  with  symmetric  kernel  function g  :  l"xK’M 
M  is  the  function  U™  :  M"  — >  M  defined  by 


Tight  concentration  bounds  are  known  for  U-statistics  with  well-behaved  kernel  func¬ 
tions.  The  specific  result  that  we  use  in  Chapter  12  is  a  Bernstein-type  inequality  due  to 
Arcones  [9]. 

Theorem  4.7  (Arcones  [9]).  For  a  symmetric  function  h  :  W1  x  Mn  — >•  M,  let  £2  = 
Ex[Ey[h(x,  y)]2]  —  ~EiXiy[h(x,y)\2,  let  b  =  \\h  —  E/iHoq,  and  let  Um(h )  be  a  random 
variable  obtained  by  drawing  x xrn  independently  at  random  and  setting  Um(h )  = 
(”')  1  Yi<j  h(xl,  x3).  Then  for  every  t  >  0, 

Pr[|C/m(A)  -  Eft|  >  t]  <  4  exp  (gs2  +  ^ )  • 

For  details  on  U-statistics,  Arcones’  theorem  and  other  related  topics,  see  [44]. 

4.1.4  Random  matrices 

Our  study  of  the  active  testing  model  in  Chapter  12  and,  more  specifically,  the  lower 
bounds  on  the  query  complexity  for  testing  linear  threshold  functions  in  this  model,  rely 
on  the  non-asymptotic  analysis  of  random  matrices.  In  this  short  section,  we  introduce 
the  definitions  and  results  we  will  need  for  our  intended  application.  For  a  more  thorough 
introduction  to  the  subject,  see  [98]. 
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We  begin  with  some  basic  matrix  definitions.  Given  an  n  x  m  matrix  A  with  real 
entries  {ai,j}i£[n],jc.[m},  the  adjoint  (or  transpose  -  the  two  are  equivalent  since  A  contains 
only  real  values)  of  A  is  the  m  x  n  matrix  A*  whose  (i,j)- th  entry  equals  a^%.  Let  us 
write  Ai  >  A2  >  •  •  •  >  Am  to  denote  the  eigenvalues  of  \J A* A.  These  values  are  the 
singular  values  of  A.  The  matrix  A*  A  is  positive  semidefinite,  so  the  singular  values  of  A 
are  all  non-negative.  We  write  Amax(AL)  =  Ai  and  Aniin(/1)  =  Am  to  represent  its  largest 
and  smallest  singular  values.  Finally,  the  induced  norm  (or  operator  norm )  of  A  is 

||AL||  =  max  =  max  IIArlL. 

xGRm\{0}  |p||2  xGRm:||a;||2  =  l 

For  more  details  on  these  definitions,  see  any  standard  linear  algebra  text  (e.g.,  [93]).  We 
will  also  use  the  following  strong  concentration  bounds  on  the  singular  values  of  random 
matrices. 

Lemma  4.8  (See  [98,  Cor.  5.35]).  Let  A  be  an  n  x  m  matrix  whose  entries  are  independent 
standard  normal  random  variables.  Then  for  any  t  >  0,  the  singular  values  of  A  satisfy 

yfn  -  s/m  —  t<  Amin(A)  <  Amax(A)  <  y/n  +  yfm  +  t  (4.2) 

with  probability  at  least  1  —  2e~t^2. 


The  proof  of  this  lemma  follows  from  Talagrand’s  inequality  and  Gordon’s  Theorem 
for  Gaussian  matrices.  See  [98]  for  the  details.  The  lemma  implies  the  following  corollary 
which  we  will  use  in  the  proof  of  our  theorem. 


Corollary  4.9.  Let  A  be  an  n  x  m  matrix  whose  entries  are  independent  standard  normal 
random  variables.  For  any  0  <  t  <  yfn  —  y/m,  the  rrt  x  m  matrix  -A* A  satisfies  both 
inequalities 


I -A* A  —  ill  <  3 


m  +  t 


n 


and  det  {/A* A)  >  e 


/  (\A ™+t)2  0  -y/m+t 


J 


(4.3) 


with  probability  at  least  1  —  2e  *2/2. 


Proof.  When  there  exists  0  <  z  <  1  such  that  1  —  z  <  -^Amax(A)  <  1  +  z,  the  identity 
^Amax(A)  =  \\^A\\  =  max^n^!  ||^2Lt||2  implies  that 


1  -  2z  <  (1 


<  (1  +  ^)2  <1  +  3^. 
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These  inequalities  and  the  identity  |  yA*A  —  /||  =  max||x||2=1 1 1  -F Ax \ \ \  —  1  imply  that 

—2 z  <  ||yA*A  —  / 1|  <  3z.  Fixing  z  =  v/^~ t  and  applying  Lemma  4.8  completes  the 
proof  of  the  first  inequality. 

Recall  that  Ai  >  •  •  •  >  Am  are  the  eigenvalues  of  \J  A*  A.  Then 


det(AyLM) 


det(V A*  A)2 

n 


(Ai  •  ■  •  xmy 

n 


> 


Amin  (A)^ 
n 


m 


Lemma  4.8  and  the  elementary  inequality  1  +  x  <  ex  complete  the  proof  of  the  second 
inequality.  □ 


4.2  Combinatorics 

4.2.1  Intersecting  families 

A  family  T  of  subsets  of  [n]  is  t-intersecting  if  for  every  pair  of  sets  S,  T  6  J,  their 
intersection  size  is  at  least  \S  fl  T\  >  t.  The  family  T  is  called  s-uniform  if  all  sets  in 
the  family  have  size  s  Erdos,  Ko,  and  Rado  [49]  asked:  what  is  the  maximum  size  of  a 
t-intersecting  s-uniform  family?  They  gave  the  answer  to  this  question  when  t  —  1,  and  a 
sequence  of  later  works  led  to  a  complete  solution  for  this  question  [49,  53,  99,  2]. 

More  recently,  Dinur  and  Safra  [47]  and  Friedgut  [55]  considered  a  variant  on  the 
original  question  of  Erdos,  Ko,  and  Rado.  For  a  fixed  0  <  p  <  1,  define  the  p-biased 
measure  of  a  family  T  of  subsets  of  [n]  to  be 

/ir(T)  =  Pr[J  6  r\ 

where  J  is  a  random  subset  of  [n]  obtained  by  including  each  element  i  e  [n]  in  J  inde¬ 
pendently  with  probability  p.  We  can  now  ask:  for  a  fixed  p,  what  is  the  maximum  p-biased 
measure  of  a  /-intersecting  family?  Dinur  and  Safra  [47]  showed  that  2 -intersecting  fam¬ 
ilies  have  small  p-biased  measure  and  Friedgut  [55]  showed  how  the  same  result  also 
extends  to  t -intersecting  families  for  any  t  >  2.  Specifically,  they  obtained  the  following 
bound  on  the  maximum  biased  measure  of  intersecting  families. 

Theorem  4.10  (Dinur  and  Safra  [47];  Friedgut  [55]).  Let  T  be  a  t-intersecting  family  of 
subsets  of  [n]  for  some  t  >  1.  For  any  p  <  the  p-biased  measure  of  T  is  bounded  by 

dpiJ7)  <  vl- 
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The  original  motivation  of  Dinur  and  Safra  [47]  in  the  study  of  intersecting  families 
was  an  application  in  hardness  of  approximation  for  the  vertex  cover  problem.  As  we 
will  see  in  Chapters  5  and  6,  Theorem  4.10  is  also  particularly  useful  for  the  analysis  of 
algorithms  for  testing  juntas  and  partial  symmetry. 


4.2.2  Graph  colorings 

The  topic  of  graph  coloring — of  assigning  colors  to  the  vertices  of  a  graph  G  =  (V,E) 
such  that  no  two  neighboring  vertices  share  the  same  color — has  developed  into  a  broad 
area  of  research  with  many  interesting  problems  and  results  [67,  78]). 

In  Chapter  9,  we  make  use  of  a  celebrated  theorem  of  this  area  due  to  Hajnal  and  Sze- 
meredi  [60].  Recall  that  the  degree  of  a  vertex  in  a  graph  is  the  number  of  edges  adjacent 
to  that  vertex.  Hajnal  and  Szemeredi’s  theorem  states  that  graphs  with  small  maximum 
vertex  degree  can  be  colored  with  very  few  colors  and,  furthermore,  that  this  coloring  can 
be  done  in  a  way  that  each  color  is  assigned  to  approximately  the  same  number  of  vertices. 

Theorem  4.11  (Hajnal-Szemeredi  [60]).  Let  G  be  a  graph  on  n  vertices  with  maximum 
vertex  degree  A  (G)  <  d.  Then  G  has  a  (d  +  1) -coloring  in  which  all  the  color  classes 
have  size  or\^\. 

4.3  Orthogonal  polynomials 

The  last  tools  that  we  introduce  in  this  section  are  orthogonal  polynomials.  The  Krawtchouk 
polynomials,  which  have  been  quite  useful  in  coding  theory  [97],  will  be  used  in  Chapter  7 
to  give  a  strong  lower  bound  on  the  query  complexity  for  testing  /e-linearity.  The  Hermite 
polynomials,  which  we  introduce  afterwards,  are  used  in  Chapter  12  in  the  analysis  of 
testers  for  linear  threshold  functions. 


4.3.1  Krawtchouk  polynomials 


For  n  >  0  and  k 
Z  is  defined  by 


0,1  ,...,n,  the  (binary)  Krawtchouk  polynomial  KJl  :  (0, 1, . . . ,  n}  — > 
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The  generating  function  representation  of  the  Krawtchouk  polynomial  K^(rn)  is 


K%(m)  =  [xk]  (1  -  x)m(l  +  x)n~m. 

Krawtchouk  polynomials  have  a  number  of  useful  properties  (see,  e.g.,  [96]).  We  make 
use  of  the  following  identities  in  our  proofs: 

Fact  4.12.  Fix  n  >  0.  Then 

i.  For  every  2  <  k  <  n,  Kf(rn)  —  K%_2(m)  =  K£+2(m  +  1). 

a  ELo  A'?(m)2  =  (-l)m/i?*(2m). 

UL  For  every  0  <  d<  EjU  (j)  (-l)JA'|(2j  +  2)  =  22dK,iZ2d‘‘(2). 

on— I'm  p2tt 

iv.  For  every  —  |  <k<  §,  K+k(m)  =  j  sinm  6  cosn~m  9ei2kd  d6. 

v.  K2n(2m  +  1)  =  0  and  (—l)m  K2n  (2  m)  is  positive  and  decreasing  in 
min{m,  n  —  m}. 

Proof.  We  prove  each  statement  individually. 

i.  The  first  statement  follows  directly  from  the  generating  function  representation  of 
Krawtchouk  polynomials. 

Kim)  -  K f_2(m)  =  ([*fc]  (1  -  x)m(l  +  *)"-"*)  -  {[xk~2]  (1  -  *r(l  +  x)n-m) 
=  [. xk }  (1  -  x)m(l  +  x)n~m(l  -  X2) 

=  [xk]  (1  -  x)m+1(l  +  x)n~m+1  =  K+2im  +  !)• 

ii.  By  some  more  elementary  manipulation  of  generating  functions,  we  have 

K(m)  =  [xk]  (1  -  x)m(l  +  x)n~m 
=  [x~k]  (1  -  l)m(  1  +  \r~m 
=  [xn~k]  (x  -  l)m(a;  +  l)n~m  =  (— 1  )mK%_k(m). 

Therefore, 

n  n 

E  Kt(m)2  =  (-1)"'  E  KZ(m)KXt(m). 

k= 0  k= 0 
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The  Cauchy  product  of  two  sequences  {a0,  cq, . . .}  and  {60,  is 

n 

(  fln)  (  ^2  ^n)  =  a^n-kj  ■ 

n>  0  n>0  n>0  fc=0 

Let  afc  =  bk  =  [xfc]  (1  -  x)m(l  +  x)n~m.  Then  (£n>0  an)  =  (1  -  x)m(l  +  x)n~m  and 

n 

V  Kl(m)Kl_t(m)  =  K]  (1  -  x)2m(l  +  =  ^"(2m). 

k=0 

iii.  Considering  generating  functions  and  applying  the  binomial  theorem,  we  get 

£  (rf)  (-i)wf  (2j + 2)  =  [*"]  £  (A  (-i)'(i  -  x)*+w + 

j=0  ^ '  j= o 

=  [*”]  (i  -  i)2(i + i)2”-m-2  x)2y{(  i + xi2)^ 

=  [x”]  (1  -  x)2(l  +  x)2n-2d~2(4x)d  =  22dK2("-d>(2). 


iv.  By  elementary  manipulation  of  generating  functions,  we  obtain 

Kl+k(m)  =  [x^+k]  (1  -  x)m(l  +  x)n~m 

=  M  -  vi  + vi)”-” 

=  [x-2‘]  (x  -  iyn(x  +  i)n-m ■ 

Applying  Cauchy’s  integral  formula  to  this  expression,  we  get 

1  f2n 

K7n+k(m)  =  —  y  (e  ~  e~ie)m(eie  +  e-W)"-"V2**  d0. 

From  the  trigonometric  identities  sin  9  =  and  cos  9  =  e'a+2e  w ,  we  get 


on  r2ir 

Kl,Am)  =  —im  /  sinm0cosn-m0e*2fc0d0. 

2  +  2vr  ,/ o 


v.  By  the  last  statement,  K2n(2rri  +  1)  is  pure  imaginary.  Since  it  is  also  real,  it  must  be 

0. 
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The  last  statement  also  yields 

o2n— 1  p2tt 

(-l)mK2n(2m)  =  - -  /  sin2m (9)  cos2n~2m (6)d6 

11  J  o 

2n 

sin 2m(9)  cos2n~2m  +  cos (9)2m  sin (9)2n~2m(9)d9. 

By  AM-GM,  for  fixed  n,  the  integrand  is  a  decreasing  function  of  min{m,  n—mj.  □ 

Krawtchouk  polynomials  are  widely  used  in  coding  theory  (see,  e.g.,[97])  and  in  our 
proofs  because  of  their  close  connection  with  the  Fourier  coefficients  of  the  (Hamming 
weight  indicator)  function  IWk  :  (0,  l}n  — >  (0, 1}  defined  by  Iwk(x)  =  1[IMI  =  k]. 

Fact  4.13.  Fix  n  >  0,  0  <  k  <  n,  and  a  G  (0,  l}n.  Then  Iwk{ot)  =  2~nK^(\a\). 


Proof.  The  Fourier  coefficient  of  I\yk  at  a  is 


/„-,(«)  =2-  ^  (-l)“-  =  2-”^(-iy 

aiG{0,l}n:||fc||=fc  j= 0 


ot\  \  In  —  \a\ 


k-j 


=  2-nAT(|a|). 


□ 


4.3.2  Hermite  polynomials 

When  considering  the  uniform  distribution  on  (0,  l}n,  we  found  that  the  set  of  linear  func¬ 
tions,  via  the  Fourier  transform,  played  an  important  role  in  the  analysis  of  boolean  func¬ 
tions.  When,  instead,  we  consider  the  standard  Gaussian  distribution  on  M",  the  Hermite 
polynomials  play  a  similarly  important  role. 

Definition  4.14.  The  Hermite  polynomials  are  a  set  of  polynomials 

ho(x)  =  1, 
hi(x)  =  x, 
h2(x)  =  ^{x2-l), 


that  form  a  complete  orthogonal  basis  for  (square-integrable)  functions  /  :  M  — »  M  over 
the  inner  product  space  defined  by  the  inner  product  (f,g)  =  E X[f(x)g(x)],  where  the 
expectation  is  over  the  standard  Gaussian  distribution  J\f( 0, 1). 
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The  Hermite  decomposition  and  Hermite  coefficients  of  a  function  are  defined  as  fol¬ 
lows. 

Definition  4.15.  For  any  S  G  N'\  define  Hs  =  n”=i  ^s,:  (xi)-  The  Hermite  coefficient  of 
/  :  Mn  — >■  M  corresponding  to  5*  is  /(S')  =  (/,  i75)  =  E x[f(x)Hs(x)}. 

Definition  4.16.  The  Hermite  decomposition  of  /  :  M”  -G  Mis  f{x)  =  E5e  n»  f(S)Hs(x). 

Definition  4.17.  The  degree  of  the  coefficient  f(S)  is  \S\  :=  Y^=i  Si-  and  the  level-k 
Hermite  weight  of  /  is  X^5  |5|=/c  f(Sf. 

The  following  basic  lemma  regarding  the  level- 1  Hermite  weight  of  functions  will  be 
of  fundamental  importance  in  the  analysis  in  Chapter  12. 

Lemma  4.18.  For  any  function  f  :  M"  — »  M,  we  have 

n 

J2  /V)2  =  E  ( x >  y )] 

where  (x,  y)  =  fTf-= ,  x(y%  is  the  standard  vector  dot  product. 

Proof.  Applying  the  Hermite  decomposition  of  /  and  linearity  of  expectation, 


E \f(x)f(y)(x,y)]  =  YJ  Y.  KS)Kt lEpsWiilElffrl*]. 

i= 1  5,TGNn 

By  definition,  a:*  =  h  |  ( xt )  =  Hei(x).  The  orthonormality  of  the  Hermite  polynomials 
therefore  guarantees  that  Ea.[i75(x)i7ei(a;)]  =  1  when  S'  =  el;  otherwise  it  takes  the  value 
0.  Similarly,  E:y [HT(y)yf\  =  1  iff  T  =  e\  □ 
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Part  I 

Exact  Query  Complexity 
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Chapter  5 
Testing  Juntas 


We  begin  by  studying  the  problem  of  testing  juntas — that  is,  of  testing  if  a  function  has  at 
most  k  relevant  variables  for  some  fixed  k.  The  motivation  for  studying  juntas  comes  from 
the  fundamental  role  of  these  functions  in  different  areas  of  computer  science. 


Motivation.  In  learning  theory,  juntas  provide  an  elegant  and  useful  framework  for 
studying  the  problem  of  learning  in  the  presence  of  irrelevant  attributes  [27,  30,  29].  More 
precisely,  a  target  function  is  a  A; -junta  when  only  k  of  the  n  possible  attributes  determine 
the  value  of  the  function  on  all  inputs.  The  problem  of  learning  juntas  is  a  fundamental 
problem  in  learning  theory  [28,  79]. 

The  important  role  of  juntas  in  learning  theory  motivates  the  study  of  junta  testing  for 
two  reasons.  First,  the  insights  on  the  structure  of  juntas  learned  in  the  course  of  research 
in  testing  juntas  may  lead  to  new  learning  algorithms  as  well.  Second,  junta  testers  may 
be  used  in  learning  theory  directly  in  the  model  selection  framework.  The  idea  of  this 
application  is  as  follows:  if  we  don’t  know  ahead  of  time  whether  a  target  function  is 
a  A; -junta  or  not,  we  can  use  a  junta  tester  to  quickly  test  the  property  and,  if  the  target 
function  is  far  from  being  a  junta,  we  may  save  ourselves  the  query  and  computational 
cost  that  would  have  been  wasted  in  trying  to  learn  the  target  function  under  the  erroneous 
assumption  that  it  is  a  junta.  (We  discuss  model  selection  in  more  detail  in  Chapter  12.) 

Juntas  also  play  an  important  role  in  complexity  theory.  The  special  case  of  testing 
dictator  functions — 1 -juntas  of  the  form  f(x)  =  xt  for  some  i  e  [n] — in  particular,  has 
been  at  the  heart  of  the  development  of  probabilistically  checkable  proofs  (PCPs)  and  in 
the  corresponding  advances  in  hardness  of  approximation  [16,  17,  15,  65].  Juntas  have 
also  appeared  directly  in  some  constructions  for  hardness  of  approximation  results — most 
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notably  for  the  vertex  cover  [47]  and  satisfiability  of  linear  equations  [75]  problems. 

Let  us  also  mention  briefly  that  the  importance  of  juntas  is  not  limited  only  to  computer 
science:  these  functions  are  also  fundamental  objects  of  study  in  the  analysis  of  boolean 
function  [32,  54]  and  in  social  choice  theory  [69]. 

Lastly,  juntas  are  play  a  central  role  in  property  testing  itself.  In  fact,  in  a  remarkable 
result,  Diakonikolas  et  al.  [46]  showed  that  efficient  junta  testers,  combined  with  the  test¬ 
ing  by  implicit  learning  method,  lead  to  efficient  testers  for  a  large  number  of  properties 
of  boolean  functions  such  as  low  Fourier  degree,  computability  by  small  decision  trees, 
computability  by  small  boolean  circuits,  representability  by  DNFs  with  a  small  number  of 
terms,  and  sparse  polynomials. 

For  many  of  the  query  complexity  upper  bounds  obtained  by  Diakonikolas  et  al.  [46], 
the  barrier  to  obtaining  improved  query  complexities  is  the  query  complexity  of  the  junta 
test.  This  state  of  affairs  suggested  that  improvements  on  the  query  complexity  for  testing 
juntas  was  likely  to  lead  to  improved  query  complexities  for  a  number  of  other  properties. 


Previous  work.  The  first  result  on  testing  dictator  functions  was  obtained  by  Bellare, 
Goldreich,  and  Sudan  [15]  in  the  context  of  testing  the  “long  code”  for  PCP  constructions 
and  further  generalized  by  Parnas,  Ron,  and  Samorodnitsky  [84].  These  results  showed 
that  it  is  possible  to  test  dictator  functions  with  0{  1/e)  queries.  This  result  also  imme¬ 
diately  implies  that  1-juntas  can  be  tested  with  the  same  number  of  queries.  (See  [21] 
for  a  more  detailed  discussion  of  testing  1-juntas  and  the  other  results  mentioned  in  this 
section.) 

The  first  general  results  on  testing  A; -juntas  for  any  k  >  1  were  obtained  by  Fischer 
et  al.  [52],  who  showed  that  the  query  complexity  for  testing  A; -juntas  is  poly(A;/e).  Most 
importantly,  this  showed  that  juntas  can  be  tested  with  a  number  of  queries  that  is  inde¬ 
pendent  of  the  size  of  the  domain  of  the  functions.  In  fact,  they  exhibited  a  number  of 
different  algorithms  for  testing  A; -juntas,  with  the  most  efficient  one  requiring  ()(k2  log2  k ) 
queries. 

Fischer  et  al.  [52]  also  introduced  the  first  lower  bounds  on  the  query  complexity 
for  testing  juntas.  They  showed  that  for  k  small  enough,  any  non-adaptive  tester  for  k- 
juntas  makes  at  least  kl(y/k/  log(k))  queries.  This  lower  bound  translates  to  a  0(log  A;) 
lower  bound  for  general  testers.  That  lower  bound  was  subsequently  improved  to  Q(k)  by 
Chockler  and  Gutfreund  [41]. 
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Result.  The  main  result  of  this  chapter  is  a  new  algorithm  for  testing  juntas  with  a  num¬ 
ber  of  queries  that  nearly  matches  Chockler  and  Gutfreund’s  lower  bound.  Specifically, 
we  establish  the  following  result. 

Theorem  5.1.  It  is  possible  to  e-test  the  function  f  :  {0,  l}n  — *  {0, 1 }  for  the  property  of 
being  a  k-junta  with  O  {k/e  +  k  log  k )  queries. 

The  algorithm  that  we  introduce  for  testing  juntas  is  conceptually  quite  simple.  The 
main  technical  component  of  this  chapter  lies  in  the  analysis  of  this  algorithm. 


5.1  The  algorithm 

The  JuntaTest  algorithm  is  based  on  two  simple  but  powerful  ideas.  The  first  idea, 
initially  presented  by  Fischer  et  al.  [52],  is  that  there  is  a  very  natural  test  for  determining 
whether  a  set  S'  C  [n]  of  coordinates  contains  a  relevant  variable  in  the  function  /  : 
{0,  l}n  — >  {0, 1}.  This  test,  which  we  call  the  RelevantTest  algorithm,1  picks  x,  y  E 
{0, 1}"  independently  and  uniformly  at  random  conditioned  on  x§  =  y$  and  tests  whether 

f(x)  ±  f(y)- 


RelevantTest (/,  S ) 

1.  Generate  x,  z  E  {0, 1}"  independently  and  uniformly  at  random. 

2.  Set  y  =  XgV  zs- 

3.  If  f(x)  f  f(y),  accept  and  return  (x,  y). 

4.  Else,  reject. 


When  none  of  the  variables  in  S  C  [n]  are  relevant  in  /,  the  RelevantTest  always 
rejects.  In  fact,  we  can  say  more:  the  probability  that  the  test  accepts  is  equal  to  the 
influence  of  S  in  /.  (See  Lemma  5.2  below.)  When  the  RelevantTest  accepts,  it 
returns  a  witness  to  the  fact  that  S  contains  a  relevant  variable  in  the  form  of  two  inputs 

x,y  E  {0,  l}n  for  which  xg  =  Vs  and  f(x)  f  f(y). 

The  second  idea  we  use  is  an  observation  of  Blum,  Hellerstein,  and  Littlestone  [29] 
first  made  in  the  context  of  learning  juntas:  if  we  have  two  inputs  x,y  E  {0,  l}n  such 
that  f(x)  f  f(y),  then  we  can  perform  a  binary  search  over  the  hybrid  vectors  between 
x  and  y  to  find  a  coordinate  that  is  relevant  in  /  with  O(logn)  queries.  We  build  on  this 

1  In  [52],  the  test  is  called  the  IndependenceTest  and  reverses  the  accept  and  reject  actions. 
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observation  by  noting  that  if  we  have  a  partition  of  the  coordinates  into  s  parts  and  only 
care  to  identify  a  part  that  contains  a  relevant  coordinate  (rather  than  the  coordinate  itself), 
then  we  can  optimize  the  binary  search  to  only  take  0(log  s)  queries.  We  call  the  algorithm 
that  implements  this  strategy  FindRelevantPart. 

FindRelevantPart(/,  { /,, . I,},  x,  y) 

1.  Initialize  t  —  0,  u  —  s. 

2.  While  u  -  l  >  1, 

2.1.  Set  m  =  t  +  p=^|  and  S  =  U/=m+,  Ir 

2.2.  Define  z  =  xgM  ys- 

2.3.  If  f(x)  =  f(z),  then  update  u  =  m. 

2.4.  Else,  update  t  =  m. 

3.  Return  Iu. 


The  JuntaTest  algorithm  combines  the  two  observations  above  in  the  obvious  way. 
It  maintains  a  set  S  of  coordinates  that  may  or  may  not  be  relevant  to  the  function,  then  it 
runs  the  RelevantTest  a  number  of  times  to  determine  whether  S  contains  a  relevant 
variable.  If  so,  then  it  obtains  a  pair  x,  y  e  (0,  l}n  such  that  f{x)  ±  f(y)  and  x§  =  y§.  By 
calling  FindRelevantPart  with  x  and  y,  the  algorithm  identifies  a  part  /  that  contains  a 
relevant  variable.  It  then  removes  the  variables  in  /  from  S  and  repeats  the  process.  If  this 
algorithm  identifies  k  +  1  different  parts  with  relevant  coordinates,  it  rejects  the  function; 
otherwise  it  accepts  the  function  as  a  A: -junta.  The  details  of  the  algorithm  are  presented  in 
Figure  5.1. 


5.2  Analysis  of  the  Algorithm 

This  section  is  dedicated  to  the  analysis  of  the  JuntaTest  algorithm.  The  first  step  in  this 
analysis  is  to  determine  the  probability  that  RelevantTest  accepts. 

Lemma  5.2.  For  any  function  f  :  (0,  l}n  — >  {0,1}  and  any  set  S  C  [n],  a  call  to 
RelevantTest}/,  S)  accepts  with  probability  Inf/ (S'). 

Proof.  RelevantTest(/,  S)  accepts  iff  f(x)  f  f(x§  V  zs),  where  x  and  z  are  picked 
independently  and  uniformly  at  random  from  {0,  l}n.  By  Definition  2.15,  the  probability 
that  f(x)  f  f(xg  V  zs)  is  Inf/(S).  □ 
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JuntaTest (/,  k,  e) 

Additional  parameters:  s  =  24  A;2,  r  =  12 (k  +  l)/e 

1.  Randomly  partition  the  coordinates  in  [n]  into  X  =  {/i, . . . ,  Is}. 

2.  Initialize  S  [n],  i  <—  0. 

3.  For  each  of  r  rounds, 

3.1.  If  RelevantTest(/,  S)  accepts  and  returns  (x,  y ),  then 

3.1.1.  I  <-  FindRelevantPart (/,  X,  x,  y). 

3.1.2.  Update  S  <-  S  \  I  and  i  <-  t  +  1. 

3.1.3.  If  £  >  k,  then  reject  the  function. 

4.  Accept  the  function. 

Figure  5.1:  The  algorithm  for  e-testing  A; -juntas. 

This  lemma  will  enable  us  to  prove  the  correctness  of  the  JuntaTest  algorithm  if  we 
can  show  that  when  /  is  far  from  being  a  A; -junta,  every  set  that  includes  all  but  at  most  k 
parts  of  the  random  partition  X  will  have  large  influence.  We  formalize  this  statement  and 
prove  it  in  the  next  sub-section. 

5.2.1  Main  Technical  Lemma 

We  begin  with  a  definition  that  extends  the  notion  of  juntas  with  respect  to  partitions  of 
the  coordinates. 

Definition  5.3  (Partition  juntas).  Let  X  be  a  partition  of  [n].  The  function  /  :  {0, 1}"  — >■ 
{0, 1}  is  a  k-part  junta  with  respect  to  X  if  the  relevant  coordinates  in  /  are  all  contained 
in  at  most  k  parts  of  X.  Conversely,  /  is  e-farfrom  being  a  k-part  junta  with  respect  to  X 
if  for  every  set  J  formed  by  taking  the  union  of  k  parts  in  X,  Inf/(  J)  >  e. 

When  /  is  a  A; -junta,  it  is  also  a  A; -part  junta  with  respect  to  any  partition  of  [n\.  The 
following  lemma  shows  that  when  /  is  far  from  being  a  A; -junta  and  X  is  a  sufficiently  fine 
random  partition,  then  with  large  probability  /  is  also  far  from  being  a  A;-part  junta  with 
respect  to  X. 

Lemma  5.4.  Let  X  be  a  random  partition  of[n\  with  s  =  24k2  parts  obtained  by  uniformly 
and  independently  assigning  each  coordinate  to  a  part.  With  probability  at  least  |,  a 
function  f  :  {0,  l}n  — »  {0, 1}  that  is  e-farfrom  being  a  k-junta  is  also  far  from  being  a 
k-part  junta  with  respect  to  X. 
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Proof.  For  r  >  0,  let  FT  —  {./  C  [n]  :  Infj(X)  <  r}  be  the  family  of  all  sets  whose 
complements  have  influence  at  most  r.  For  any  two  sets  J,  K  e  Fl ,  the  sub-additivity  of 
influence  implies  that 

Inf/(  J~nK)  =  Inf  f(J  UK)<  Inf  f(J)  +  Inf/ (if)  <  2  •  §  =  e. 

But  /  is  e-far  from  k -juntas,  so  by  Lemma  2.21  every  set  S  C  [n]  of  size  \S\  <  k  satisfies 
Inf f  (S)  >  e.  This  implies  that  |  J  fl  K\  >  k  and,  since  this  argument  applies  to  every  pair 
of  sets  in  the  family,  that  Fl  is  a  (k  +  1) -intersecting  family. 

Let  us  now  consider  two  separate  cases:  when  Fl  contains  a  set  of  size  less  than  2 k\ 
and  when  it  does  not.  In  the  first  case,  let  J  e  Fl  be  one  of  the  sets  of  size  \J\  <  2k. 
By  the  union  bound,  the  probability  that  J  is  completely  separated  by  the  partition  X  is 
at  least  1  —  2 k  ■  (^)  =  |.  For  every  set  K  e  Tl,  we  have  |  J  fl  K\  >  k  +  1.  So  when 
J  is  completely  separated  by  X,  no  set  K  in  Ft  is  covered  by  the  union  of  k  parts  in 
X.  Therefore,  with  probability  at  least  |,  the  function  /  is  §-far  from  A; -part  juntas  with 
respect  to  X,  as  we  wanted  to  show. 

Consider  now  the  case  where  Fl  contains  only  sets  of  size  at  least  2k.  Then  we  claim 
that  Fl  is  a  2A  - intersecting  family:  otherwise,  we  could  find  sets  J,  K  e  Fl  such  that 
|  J  fl  K |  <  2k  and  Inf/(  J  fl  K)  <  Inf/(  J)  +  Inf/ (IT)  <  |,  contradicting  our  assumption. 

Let  J  C  [n]  be  the  union  of  k  parts  in  X.  Since  X  is  a  random  partition,  J  is  a  random 
subset  obtained  by  including  each  element  of  [n]  in  J  independently  with  probability  p  = 

!  =  2Jt<2jtl'ByThe0rem4'10- 

Pr[Inf/P)  <11=  Pr[^  £  p]  =  <  (f)2‘  . 

Applying  the  union  bound  over  the  possible  choices  of  J,  we  get  that  /  is  | -close  to  a 
A; -part  junta  with  respect  to  X  with  probability  at  most 


5.2.2  Proof  of  Theorem  5.1 

We  are  now  ready  to  complete  the  analysis  of  the  JuntaTest  algorithm. 

Theorem  5.1  (Restated).  It  is  possible  to  e-test  the  function  f  :  {0,  l}n  — >  {0, 1 }  for  the 
property  of  being  a  k-junta  with  O  [k/e  +  k  log  k )  queries. 
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Proof.  We  begin  by  determining  the  query  complexity  of  the  JuntaTest  algorithm.  At 
most  2 r  =  24 (k  +  l)/e  queries  are  made  in  the  execution  of  line  3.2  of  the  algorithm,  and 
at  most  ( k  +  1)  log  s  —  (k  +  1)  log(24£;2)  queries  are  made  in  line  3.2.1  of  the  algorithm. 
So  the  algorithm  makes  a  total  of  O (k/e  +  k  log  k)  queries  to  the  input  function. 

The  completeness  of  the  JuntaTest  algorithm  is  easy  to  establish:  when  the  input 
function  is  a  k -junta,  it  contains  at  most  k  parts  with  relevant  coordinates,  so  the  algorithm 
always  accepts  the  function.  Therefore,  the  JuntaTest  algorithm  has  one-sided  error. 

Finally,  we  analyze  the  soundness  of  the  JuntaTest  algorithm.  By  Lemma  5.4,  with 
probability  at  least  5/6  a  function  /  that  is  e-far  from  being  a  h -junta  is  also  e/2-far  from 
being  a  /c-part  junta  with  respect  to  the  random  partition  of  the  coordinates.  When  this  is 
the  case,  the  influence  of  S  is  at  least  e/2  until  k  +  1  parts  with  relevant  coordinates  are 
identified.  So  the  expected  number  of  rounds  required  to  identify  k  +  1  parts  with  relevant 
variables  is  2 (k  +  l)/e.  By  Markov’s  Inequality,  the  probability  that  the  algorithm  does 
not  identify  k  +  1  relevant  parts  in  12 (k  +  l)/e  rounds  is  at  most  1/6,  and  the  overall 
probability  that  the  JuntaTest  algorithm  fails  to  reject  /  is  at  most  1/3.  □ 


5.3  Notes  and  Discussion 

We  conclude  this  chapter  by  discussing  some  implications  of  Theorem  5.1,  the  problem  of 
testing  non-boolean  functions  for  the  property  of  being  a  junta,  and  non-adaptive  testing 
of  juntas. 


Implications.  We  mentioned  in  the  introduction  that  one  of  the  motivations  for  study¬ 
ing  the  junta  testing  problem  is  the  testing  by  implicit  learning  method  of  Diakonikolas 
et  al.  [46],  which  uses  junta  testers  to  obtain  efficient  testing  algorithms  for  a  variety  of 
other  properties.  Our  hope  was  that  a  more  query-efficient  junta  test  would  lead  to  im¬ 
proved  upper  bounds  on  the  query  complexity  for  a  number  of  other  properties  of  boolean 
functions. 

Chakraborty,  Garcia  Soriano,  and  Matsliah  showed  that  the  JuntaTest  algorithm  pre¬ 
sented  in  this  section  does  indeed  lead  to  improved  upper  bounds  for  the  query  complexity 
of  other  properties  of  boolean  functions  [39].  Their  work  introduces  an  efficient  “sample 
extractor”  for  juntas  that,  combined  with  the  JuntaTest  algorithm,  improves  the  query 
complexity  for  testing  computability  by  small  DNFs,  decision  trees,  boolean  circuits,  or 
branching  programs,  for  testing  low  Fourier  degree,  and  for  testing  sparse  polynomials. 
We  summarize  their  results  in  Table  5.1. 
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Property 

Upper  bound 

Previous  upper  bound 

Lower  bound 

s-term  DNFs 

O(slogs) 

0(s4) 

[46] 

(log  s) 

[39] 

s-term  monotone  DNFs 

O(slogs) 

0(s2) 

[84] 

? 

Size-.s  boolean  formulae 

O(slogs) 

0(s4) 

[46] 

S«(  1) 

[39] 

Size-s  branching  programs 

O(slogs) 

0(s4) 

[46] 

D(s) 

(§7) 

Size-s  decision  trees 

O(slogs) 

0(s4) 

[46] 

D(logs) 

(§7) 

Size-s  boolean  circuits 

0(s2) 

0(s6) 

[46] 

s«(i) 

[39] 

s-sparse  polynomials 

O(slogs) 

0(s4) 

[46] 

D(s) 

(§7) 

Fourier  degree  <  d 

0(  22d) 

0(26d) 

[46] 

n(d) 

[40] 

Table  5.1:  Results  obtained  by  Chakraborty,  Garcia  Soriano,  and  Matsliah  [39]  by  combining  an 
efficient  sample  extractor  with  the  JuntaTest  algorithm. 

Non-adaptive  testing.  The  JuntaTest  algorithm  presented  in  this  chapter  is  adaptive — 
indeed,  adaptivity  is  the  key  component  that  enables  the  FindRelevantPart  algorithm 
to  require  only  0(log  k)  queries.  A  natural  question  to  ask  is  whether  we  can  also  test 
juntas  non- adaptively  with  O(klogk)  queries.  That  question  remains  open.  The  best  al¬ 
gorithm  for  testing  juntas  non- adaptively  requires  0(k 3//2)  queries  [19],  and  the  best  lower 
bound  for  non-adaptive  testing  of  juntas  is  Q(k  log  k )  queries. 


Testing  general  functions.  Let  X1, . . . ,  Xn,  Y  be  arbitrary  finite  sets.  The  notion  of 
juntas  can  be  extended  to  functions  /  :  Xi  x  •  •  •  x  Xn  — >  Y  over  general  product  domains, 
and  we  may  ask  if  such  functions  can  also  be  tested  efficiently  for  the  property  of  being  a 
A; -junta. 

Indeed,  they  can.  The  JuntaTest  algorithm  works  essentially  as-is  in  the  more  gen¬ 
eral  setting  and,  with  the  appropriate  generalizations  of  the  notion  of  influence,  the  same 
analysis  applies  mostly  as-is  in  the  more  general  setting  as  well.  For  the  details,  we  refer 
the  reader  to  [20]. 
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Chapter  6 

Testing  Partial  Symmetry 


The  main  result  of  the  last  chapter  showed  that  juntas  are  efficiently  testable.  Looking 
back  on  this  result,  we  may  ask:  can  we  explain  why  the  k -junta  property  is  so  efficiently 
testable?  In  other  words,  what  characteristics  of  the  junta  property  did  we  use  to  obtain  an 
efficient  tester? 

Intuitively,  it  seems  that  the  main  characteristic  of  juntas  that  lets  us  test  the  corre¬ 
sponding  property  efficiently  is  that  all  the  irrelevant  variables  are  “essentially  the  same”. 
More  precisely,  juntas  are  invariant  under  any  relabeling  of  the  irrelevant  variables.  This 
characteristic  is  useful,  notably,  in  arguing  that  a  partition  of  [n]  is  “good”  when  it  com¬ 
pletely  separates  the  set  of  relevant  variables. 

It  is  not  immediately  clear,  however,  how  accurate  this  intuition  is.  As  a  test  for  this 
intuition,  we  examine  in  this  section  the  problem  of  testing  partially  symmetric  functions . 
For  a  fixed  2  <  t  <  n,  the  function  /  :  {0,  l}n  — >  {0, 1}  is  t-symmetric  if  there  is  a  set 
S  of  |Sj  =  t  coordinates  for  which  /  is  invariant  under  all  relabeling  of  the  variables  in 
S.  In  the  informal  terms  of  the  last  paragraph,  the  function  /  is  t-symmetric  if  it  has  t 
variables  that  are  “essentially  the  same”.  The  set  of  (n  —  A  j-symmctric  functions  includes 
all  A; -juntas  as  well  as  many  other  functions  as  well.  (For  example,  the  parity  function 
Xi  ©  •  •  •  ©  xn  is  t-symmetric  for  every  2  <  t  <  n.) 

We  show  that  we  can  indeed  test  (n  —  A') -symmetry  as  efficiently  as  we  can  test  k- 
juntas. 

Theorem  6.1.  Fix  n  >  0  and  0  <  k  <  The  property  of  being  (n  —  k)- symmetric  is 
e-testable  with  0{k\ogk  +  k/e)  queries. 

The  algorithm  used  to  test  partially  symmetric  functions  is  similar  to  the  junta  testing 
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algorithm,  but  the  analysis  is  more  involved.  An  important  ingredient  of  the  analysis  is  a 
new  notion  of  influence,  called  “symmetric  influence”,  that  we  introduce  in  Section  6.2. 


6.1  The  Algorithm 

To  create  an  algorithm  for  testing  partial  symmetry,  we  extend  the  JuntaTest  algorithm 
from  the  last  chapter.  The  first  component  of  the  JuntaTest  was  the  RelevantTest 
algorithm  that,  given  a  function  /  and  a  set  S  of  coordinates,  aimed  to  determine  if  S 
contained  a  variable  that  was  relevant  in  /.  The  analogous  algorithm  for  our  present  pur¬ 
poses  is  the  AsymmetricTest  that  aims  to  determine  whether  /  is  invariant  under  all 
relabeling  of  the  variables  in  S  or  not. 


AsymmetricTest(/,  S) 

1.  Generate  x  €  {0, 1}"  uniformly  at  random. 

2.  Generate  the  permutation  n  on  [n]  uniformly  at  random  condi¬ 
tioned  on  7T (i)  =  i  for  each  i  G  [n]  \  S. 

3.  If  f(x )  ^  /(thc),  accept  and  return  (x,  irx). 

4.  Else,  reject. 


When  /  is  invariant  under  all  permutations  of  the  labels  of  the  variables  in  S,  Asym¬ 
metricTest  always  rejects.  Furthermore,  when  the  test  accepts,  it  also  returns  a  witness 
(x,  y )  of  the  asymmetry  of  S  in  /.  The  probability  that  the  test  accepts  when  S  is  asym¬ 
metric,  however,  is  not  immediately  clear.  As  we  will  see  in  Lemma  6.7,  this  probability 
is  determined  by  the  notion  of  symmetric  influence  that  we  define  in  the  next  section. 

The  second  main  component  of  the  JuntaTest  was  the  FindRelevantPart  algo¬ 
rithm.  Once  again,  we  can  define  a  similar  algorithm  for  finding  a  part  that  is  asymmetric  in 
/.  Recall  that  in  the  last  chapter,  the  algorithm  found  a  relevant  part  by  performing  a  binary 
search  over  the  hybrid  strings  between  a  pair  of  elements  x,y  that  satisfy  f(x)  ^  /(y). 
In  the  present  situation,  our  task  is  complicated  by  the  fact  that  we  must  perform  a  binary 
search  over  elements  z  with  the  same  Hamming  weight  as  x  and  y.  We  do  so  by  treating 
one  of  the  parts  as  a  “workspace”  that  we  use  to  control  the  Hamming  weight  of  the  hybrid 
strings.  We  then  consider  a  binary  search  in  which  the  hybrid  strings  we  generate  have 
Hamming  weight  close  to  that  of  x  and  y.  The  following  BalancedSplit  algorithm  is  a 
key  component  that  lets  us  achieve  this  goal. 
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BalancedSplit(S',  ai,  ...,an,  rmin,  rmax) 

1.  Initialize  T  0. 

2.  While  |T|  <  L^J, 

2.1.  Find  j  e  S\T  that  satisfies  rr  nin  <  %  +  <  rmax. 

2.2.  Update  T  T  U  {j}. 

3.  Return  T. 


We  are  now  ready  to  complete  the  definition  of  the  FindAsymmetricPart  algo¬ 
rithm.  When  (a)  the  part  Is  passed  into  this  function  is  larger  than  the  other  parts  and 
contains  no  asymmetric  variable  and  (b)  x  and  y  satisfy  INI  =  \\y\\ and  f(x)  ±  f(y), 
this  algorithm  identifies  a  part  that  contains  an  asymmetric  variable.  (The  analysis  of  this 
algorithm  will  be  completed  in  Lemma  6.9  in  Section  6.3.) 

FindAsymmetricPart(/,  x,y,h,...,  Is) 

1.  Fori  =  0, 1, ... ,  \IS\, 

1.1.  Set  wl  to  be  a  string  satisfying  || w\  ||  =  i  and  ||w^||  =  0. 

s  ls 

2.  Fix  Tmin  =  -\\xIs\\  and  rmax  =  \IS\  -  ||x/J|. 

3.  For  j  —  1, . . . ,  s  —  1, 

3.1.  Set  aj  =  HxjJI  -  \\yij  || . 

4.  Initialize  Jx  0,  Jy  0,  and  J?  •<—  [s  —  1]. 

5.  While  |J?|  >  1, 

5.1.  Set  J'  BalancedSplit( J?,  ai, . . . ,  as_i,  rmin,  rmax). 

5.2.  Set  S  {Jj£JxUj,  Ij  and  T  Ujej^uC^VJ')  ^r 

5.3.  Set  z  <—  xs  V  yr  V  iuIWHNsVi/tII. 

5.4.  If  f(x)  =  f(z),  then 

5.4.1.  Update  Jx  <—  Jx  U  J'  and  J?  •(—  J?  \  J' . 

5.5.  Else 

5.5.1.  Update  Jy  Jy  U  (J?  \  J')  and  J?  J' . 

6.  Return  Ij  for  the  element  j  in  the  singleton  set  J?  =  {j } . 


Finally,  we  obtain  the  PartialSymmetryTest  by  combining  the  two  components 
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PartialSymmetryTest(/,  k,  e) 

Additional  parameters:  s  =  @(/c2/e2),  r  =  Q(k/e) 

1.  Randomly  partition  [n]  into  {/,. . . . ,  /s_i,  J'+1,  J'+2,  J'+3}. 

2.  Set  /,  <-  /'  U  /'+1  U  J'+2  U  /'+3. 

3.  If  \Ii\  >  ^ 1 1 s  for  some  i  e  [s  —  1],  fail. 

4.  Initialize  S  [n]  \  W  and  £  0. 

5.  For  each  of  r  rounds, 

5.1.  If  AsymmetricTest(/,  S')  accepts  and  returns  (t,  y),  then 

5.1.1.  /  FindAsymmetricPart(/,  x,  y,  h,...  ,IS). 

5.1.2.  Update  S  <-  S  \  I  and  £  <-  £  +  1. 

5.1.3.  If  £  >  k,  then  reject  the  function. 

6.  Accept  the  function. 

Figure  6.1:  The  algorithm  for  e-testing  (n  —  A')-symmctric  functions. 


AsymmetricTest  and  FindAsymmetricPart  in  the  natural  way.  That  is,  we  gen¬ 
erate  random  pairs  of  elements  x,y  e  {0,  l}n  with  the  same  Hamming  weight  and  use 
AsymmetricTest  to  determine  if  x  and  y  form  an  asymmetry  witness  for  /.  If  so,  we 
use  FindAsymmetricPart  to  identify  one  of  the  parts  that  contains  an  asymmetric  vari¬ 
able.  We  reject  the  function  if  we  identify  more  than  k  parts  with  asymmetric  variables. 
The  resulting  algorithm  is  very  similar  to  the  JuntaTest.  It  is  presented  in  Figure  6.1. 

In  the  rest  of  the  chapter,  we  analyze  the  PartialSymmetryTest  algorithm  to  ver¬ 
ify  that  it  is  indeed  a  valid  e-tester  for  (n  —  k) -symmetry.  To  complete  this  analysis, 
however,  we  must  first  introduce  a  new  measure  of  influence,  which  we  call  “symmetric 
influence”. 


6.2  Symmetric  influence 

Recall  that  our  definition  of  influence  of  a  set  of  variables  measures  the  sensitivity  of  a 
function  to  re-randomizing  the  values  of  the  variables  in  that  set.  Similarly,  we  define 
the  symmetric  influence  of  a  set  of  variables  to  measure  the  sensitivity  of  a  function  to  re¬ 
ordering  the  values  of  the  variables  in  that  set — or,  equivalently,  to  relabeling  the  variables 
in  that  set. 
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Definition  6.2  (Symmetric  influence).  The  symmetric  influence  of  a  set  J  C  [n]  of  vari¬ 
ables  in  a  Boolean  function  /  :  (0,  l}n  — >•  (0, 1}  is  defined  as 

SynrInff(J)  =  Pr  \f(x)  f  fh tx)  I  Vi  G  J,x(i)  =  il . 

3  a;e{0,l}™,7reSn  L  1  J 


It  follows  from  this  definition  that  a  function  /  is  ^-symmetric  iff  there  exists  a  set  J 
of  size  |  J\  =  t  such  that  Syrnlnf j(J)  =  0.  In  fact,  we  can  establish  a  much  stronger 
statement. 

Lemma  6.3.  Given  a  function  f  :  (0,  l}n  — *  (0, 1}  and  a  subset  J  C  [n],  let  fj  be  the 
J -symmetric  function  closest  to  f.  Then,  the  symmetric  influence  of  J  satisfies 

dist (/,  fj )  <  Synrlnf j( J)  <  2  •  dist (/,  fj )  . 


Proof.  For  every  weight  0  <  w  <  n  and  z  G  {0, define  the  layer  Lf  ■=  {x  G 
(0,  l}n  |  ||a;||  =  wAxj  =  z}  to  be  the  vectors  of  Hamming  weight  w  which  identifies  with 
z  over  the  set  J  (notice  that  some  of  these  layers  may  be  empty).  Let  be  the  fraction  of 
the  vectors  in  L'j^z  that  one  has  to  modify  in  order  to  make  the  restriction  of  /  over  L  '' 
to  be  constant  (notice  that  p ™  G  [0,  \]  for  every  z,  w). 

With  this  notation,  we  can  restate  the  definition  of  the  symmetric  influence  of  J  as 
follows. 


Synrlnfj(J)  = 


\j\ 


T,  ]  ■  Pr  [fix )  f  finx )  I  x  G  L ^  ] 

J^zl  xelo.ijyTreSj1,7  v  r  J  v  ’  1  J^zl 


eel;: 


w  —  z 


2^(1  ~PWZ) 


This  holds  as  in  each  such  layer,  the  probability  that  x  and  xx  would  result  in  two  different 
outcomes  is  the  probability  that  the  x  would  be  chosen  out  of  the  smaller  part  and  xx  from 
the  complement,  or  vise  versa. 

The  function  fj  can  be  obtained  by  modifying  /  at  pz  fraction  of  the  inputs  in  each 
layer  Tj^,,  as  each  layer  can  be  addressed  separately  and  we  want  to  modify  as  few  inputs 
as  possible.  By  this  observation,  we  have  the  following  equality. 

Z  W  '  11/ 


But  since  1  —  p™  G  [|,1],  we  have  that  pf  <  2/p’ ( 1  —  p'f)  <  2pf  and  therefore 
dist (/,  fj )  <  Symlnfy( J)  <  2  •  dist (/,  fj )  as  required.  □ 
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An  immediate  corollary  of  Lemma  6.3  gives  a  characterization  of  the  symmetric  influ¬ 
ence  of  functions  that  are  far  from  partially  symmetric. 

Corollary  6.4.  Let  f  :  {0,  l}n  — »  {0, 1}  be  a  function  that  is  c-farfrom  being  t-symmetric. 
Then  for  every  set  J  C  [n]  of  size  \  J\  >  t,  Symlnf/ J)  >  e  holds. 

Proof.  Fix  J  C  [n]  of  size  |  J\  >  t  and  let  g  be  a  J-symmetric  function  closest  to  /.  Since  g 
is  symmetric  on  any  subset  of  J,  it  is  in  particular  Asymmetric  and  therefore  dist  (/,  g)  >  e 
as  /  is  e-far  from  being  t-symmetric.  Thus,  by  Lemma  6.3,  Symlnf  f  ( J)  >  dist  (/,  g)  >  e 
holds.  □ 


The  analysis  of  the  JuntaTest  relied  crucially  on  the  fact  that  the  notion  of  influence 
is  both  monotone  and  sub-additive  (Theorem  2.19).  The  following  lemma  shows  that 
symmetric  influence  is  also  monotone. 

Lemma  6.5  (Monotonicity).  For  any  function  f  :  {0,  l}n  — >  {0, 1}  and  any  sets  J  C 
K  C  [n], 

Symlnf/  J)  <  Symlnf f(K)  . 

Proof.  Fix  a  function  /  and  two  sets  J,  K  C  [n]  so  that ./  C  K.  We  have  seen  before  that 
the  symmetric  influence  can  be  computed  in  layers,  where  each  layer  is  determined  by  the 
Hamming  weight  and  the  elements  outside  the  set  we  are  considering.  Using  the  fact  that 
Var(X)  =  Pr[A"  =  0]  -Pr|A"  =  1],  the  symmetric  influence  is  twice  the  expected  variance 
over  all  the  layers  (considering  also  the  size  of  the  layers).  Using  the  same  notation  as 
before, 

Symlnf/ J)  = 

z  w  ' 

=  2  •  E  Ivar \f(x)  I  ] 

y  L  x  u  w  1 


A  key  observation  is  that  since  K  C  J,  the  layers  determined  when  considering  J  are 
a  refinement  of  the  layers  determined  when  considering  K .  Together  with  the  fact  that 
Var(X)  =  Pr[A"  =  0]  •  Pr[A"  =  1]  is  a  concave  function  in  the  range  [0, 1],  we  can 
apply  Jensen’s  inequality  on  each  layer  before  and  after  the  refinement  to  get  the  desired 
inequality.  More  precisely,  for  every  z  G  {0,  l  /'  and  0  <  w  <  n. 


Var [f(x)  |  x  G  J  >  E 


y  l 


Var[/(x)  |  x  G  \  y  G 


K-^z 


Averaging  this  over  all  layers,  we  get  the  desired  result. 


□ 
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Finally,  to  complete  the  analogy  with  influence,  we  would  like  to  show  that  symmetric 
influence  is  also  sub-additive.  Unfortunately,  that  is  not  the  case.  Consider  for  example 
the  function  /  :  {0,  l}6  — y  {0, 1}  defined  by 

f(x)  =  (xi  Vr2V  x3)  ©  (x4  Vr5V  x6). 

This  function  /  is  invariant  under  relabelings  of  the  variables  in  {1,  2,  3}  or  in  {4, 5,  6} 
so  Symlnfj({l,  2,  3})  =  Symlnfy({4,  5,  6})  =  0.  But  Syrnlnf /({l,  • . . ,  6})  >  0,  as 
shown  for  instance  by  the  fact  that  /( 1, 1, 1, 0,  0,  0)  f  /( 1, 1,  0, 1, 0,  0).  Therefore,  for 
this  function,  we  have 

Symlnf  j({1,  2,  3}  U  {4,5,6})  >  0  =  Symlnf^l,  2,  3})  +  SymInf/({4,  5,  6}). 

More  generally,  we  can  consider  the  function  f(x )  =  fi(xj)  ©  /2(ia')  for  some 
partition  [n]  =  J  U  K  and  two  randomly  chosen  symmetric  functions  f\  and  /2.  With 
high  probability,  /  will  be  very  far  from  symmetric  and  we  will  have  Symlnf^Qn])  = 
Syrnlnf y(, 7  U  K)  «  }  while  Syrnlnf  y(  J)  =  Symlnfj-(iC)  =  0.  This  example  shows  that 
symmetric  influence  can  be  far  from  sub-additive  in  general.  We  can  show,  however,  that 
the  symmetric  influence  of  sets  of  variables  is  approximately  sub-additive  when  the  sets 
are  small  enough. 

Lemma  6.6  (Weak  sub-additivity).  There  is  a  universal  constant  c  such  that,  for  any  con¬ 
stant  0  <  7  <  1,  a  function  f  :  {0,  l}n  — >  {0, 1},  and  sets  J,  K  C  [n]  of  size  at  least 
(1  -  7 )n, 

Syrnlnf^  (, 7  UK)  <  Symlnfj(J)  +  Symlnf^(iC)  +  c^/ 7  . 


Proof  We  will  choose  c  >  \/2.  Then,  the  RHS  becomes  more  than  one  when  7  >  \  and 
the  inequality  trivially  holds.  Thus,  we  assume  7  <  \  in  what  follows.  We  use  the  same 
notions  as  in  Lemma  4.5.  From  Lemma  4.5, 


SymInf(J  U  K)  =  Pr [f(x)  f  f(ttx)] 

X,1 T 

=  E  [Pr {f(x)  f  f(nx)]] 


<  E  Pr  [f(x)  f  f  (jt jtt Kx)]  +  dTY(Vwx,VnjnKX) 


=  E 


X 


Pr  [f(x)  f  f(7rj7rKx)\  +  dTW{V.\j^K\,\xJUK\,\K\j\,'H\K\,\xK\,\K\j\) 
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Lett  = 


1 


100^7 


.  We  call  x  strongly  balanced  if 


II^JUA'H 


|  JUA"| 
2 


<  t\fn'  and 


\xk\ 


m 

2 


< 


t \/\K |.  From  Chernoff  bound  and  the  union  bound,  x  is  strongly  balanced  except  with 
probability  at  most  4exp(— 2f2)  <  4  exp  5(||l|0,  )  <  c'y  for  some  constant  d . 


Note  that  | K  \  J\  <  \.J\  <  yri  and  |  J  U  K\  —  \K\  =  |  J  \  K\  <  \K\  <  yn.  Also, 
t  =  1001^  <  holds.  Thus,  when  x  is  strongly  balanced,  Lemma  4.2  implies  that 

dTv('H\juK\,\xJUK\,\K\j\,'H\K\,\xK\,\K\j\)  <  c4.2(l  +  t)y.  Then,  we  have 

SymInf(J  U  K)  <  E  [  Pr  [f(x)  ^  f(7rJ7rKx)\]  +  Pr[a^  is  not  strongly  balanced] 

X  L7TJ,7TX  J  X 

+  E  [l[a;  is  strongly  balanced]  •  c4.2(l  +  t)7j 

<  Pr  [f(x)  i-  /( ttkx)]  +  Pr  [/( ttkx)  ±  f(njnKx)} 

X,7Tj,7Tk  X,7Tj,TTk 

+  dy  +  C4.2(l  +  t)  7 

<  Symlnfj(J)  +  SymInf/(iT)  +  c^7 

for  some  constant  c.  □ 


6.3  Analysis  of  the  algorithm 

This  section  is  devoted  to  the  analysis  of  the  PartialSymmetryTest  algorithm.  The 
broad  outline  of  this  analysis  closely  mirrors  that  of  the  analysis  of  the  JuntaTest  algo¬ 
rithm  in  the  last  chapter. 


6.3.1  Analysis  of  AsymmetricTest 

As  a  first  step  in  the  analysis  of  the  partial  symmetry  tester,  we  show  that  the  Asym¬ 
metricTest  accepts  a  set  with  probability  equal  to  the  symmetric  influence  of  that  set. 
The  proof  of  this  statement  follows  almost  immediately  from  the  definition  of  symmetric 
influence. 

Lemma  6.7.  For  any  function  f  :  {0,  l}n  — >  {0,1}  and  any  set  S  C  [n],  a  call  to 
As YMMETRicTESTf/,  S)  accepts  with  probability  Symlnfj(S'). 

Proof.  The  AsymmetricTest  algorithm  accepts  iff  f(x)  f  /( nx).  When  x  is  chosen 
uniformly  at  random  from  {0,  l}n  and  n  is  a  permutation  on  [n]  chosen  uniformly  at  ran¬ 
dom  conditioned  on  n (i)  =  i  for  each  i  7  S,  the  probability  that  f(x)  and  /( nx)  are 
distinct  is,  by  definition,  Symlnf f(S).  □ 
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6.3.2  Analysis  of  FindAsymmetricPart 


To  establish  the  correctness  of  FindAsymmetricPart,  we  begin  by  establishing  suf¬ 
ficient  conditions  for  guaranteeing  that  the  BalancedSplit  algorithm  will  succeed  in 
finding  a  balanced  split  of  the  input  set. 

Lemma  6.8.  Fix  a  set  S  C  [n]  and  constants  rmin,  rmax  that  satisfy  rmin  <  0  <  rmax.  Let 
ai, . . .  ,an  satisfy  YJ!=i  A  =  0,  rmin  <  Y^iesat  -  and  maxie[n]  |a|  <  w~rmin. 

Then  BalancedSplit  returns  a  setT  of  size  \T\  =  that  satisfies 


A 


min  —  ^  '  1 
ieT 


H  —  Anax  • 


Proof.  To  establish  the  lemma,  it  suffices  to  show  that  for  any  strict  subset  T  C  S  that 
satisfies  rmin  <  A  <  rmax,  we  can  always  hnd  an  elements  j  G  S  \T  such  that 

An  in  ^  &j  +  y  lcij'  A  X  Anax- 

Consider  the  case  when  J2ieTai  —  Tmin+Tmax .  If  there  exists  j  e  S  \  T  such  that 
dj  >0,  then 

\  '  ^  \  '  Anax  Anin  Anax  T  Anin 

Amin  _  /  y  Ai  _  tlj  T  y  ^  Cli  <  —  h  —  Anax- 

ieT  ieT 

Conversely,  if  every  j  e  S\T  satisfies  aj  <  o,  for  any  such  j  we  have 

Tmax  ^  y  ^  ai  —  aj + y  ^  a  —  y '  A  ^  Anin- 

ieT  ieT  ies 

In  either  case,  we  obtain  the  desired  inequality. 

The  case  when  A  >  rmid  is  nearly  identical.  □ 


We  can  now  complete  the  proof  of  correctness  of  FindAsymmetricPart. 

Lemma  6.9.  Fix  f  :  {0,l}n  — >  {0,1}.  Let  X  =  {/i,...,/s}  be  a  partition  of  [n\ 
into  s  parts  such  that  \IS\  >  2  max, |  I,  and  /.,  contains  no  asymmetric  variables. 
Let  x,y  e  {0,  l}n  satisfy  |x|  =  \y\,  xia  =  yis,  and  f(x)  f  f(u)-  Then  a  call  to 
FINDASYMMETRICPART}/,  x,  y,  I\, . . .  ,IS)  returns  an  asymmetric  part  in  X.  Further¬ 
more,  this  call  requires  O(logs)  queries  to  the  function  f. 


Proof.  We  claim  that  the  sets  Jy,  and  J?  satisfy  the  following  three  properties  at  every 
invocation  of  the  loop  in  step  5. 
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(i)  Jx,  Jy,  and  J?  form  a  partition  of  [s  —  1]. 

(ii)  f(x)  =  f(xJx UJ?  V  yJy). 

(iii)  f(y)  =  f{xJx  V  yJyUj?). 

All  three  claims  can  be  verified  by  direct  examination  of  the  algorithm.  Claim  (i)  is  clearly 
satisfied  initially  by  the  definition  in  Step  4  and  is  also  clearly  maintained  by  the  updates 
in  Steps  5.4.1  and  5.5.1  since  every  elements  added  to  Jx  or  to  Jy  are  simultaneously 
removed  from  ./?.  Claim  (ii)  is  (vacuously)  true  initially  and  is  guaranteed  to  remain  true 
since  we  only  add  elements  to  Jx  when  the  condition  in  Step  5.4  holds.  The  proof  of  Claim 
(iii)  is  identical  but  for  the  observation  that  we  add  elements  to  Jy  when  the  condition  in 
Step  5.4  does  not  hold  or,  equivalently,  when  f(y )  =  f(z). 

The  invariants  in  Claims  (i)-(iii)  guarantee  that  at  every  iteration  of  the  loop,  we  have 

f(xJx  V  xh  V  yJy  V  wM-Wxjxvxji*yjyU)  ^  /(Xj^  v  yJ?  V  yJy  V  wl|a:|HI *j«vi/j?vi/j1/||)_ 

When  J?  =  {j }  is  a  singleton,  this  inequality  and  the  condition  that  Is  contains  no  asym¬ 
metric  variable  guarantees  that  Ij  is  an  asymmetric  part.  Furthermore,  the  conditions  of 
the  lemma  also  guarantee  that  the  conditions  of  Lemma  6.8  hold.  This  guarantees  that 
every  iteration  of  the  loop  cuts  the  size  of  J?  in  half,  so  that  the  algorithm  is  guaranteed  to 
terminate  in  logs  steps.  Since  the  only  queries  to  /  occur  in  Step  5.4,  this  concludes  the 
query  complexity  analysis  of  the  algorithm.  □ 

6.3.3  Main  technical  lemma 

The  key  claim  in  the  analysis  of  PartialS  ymmetryTest  is  that  if  a  function  is  far  from 
being  (n  —  A;) -symmetric,  then  it  is  also  far  from  being  symmetric  on  any  union  of  all  but  k 
parts  of  a  sufficiently  fine  random  partition.  (C.f.  Lemma  5.4.)  We  now  prove  this  claim. 

Lemma  6.10.  Let  f  :  (0,  l  }n  — >  (0, 1}  be  a  function  e-far  from  (n  —  k)-symmetric  and 
X  be  a  random  partition  of[n ]  into  r  =  c  ■  k2/e2  parts,  for  some  large  enough  constant  c. 
Then  with  probability  at  least  |,  Syrnlnf  j(  J)  >  |  holds  for  any  union  J  ofk  parts. 

Proof  We  first  note  that  when  the  number  of  parts  r  is  bigger  then  n,  we  simply  partition 
into  the  n  single-element  sets  and  the  lemma  trivially  holds.  For  0  <  t  <  1,  let  Tt  = 
{J  C  [n]  :  Symlnfy(J)  <  t,  \J\  <  5 kn/r}  be  the  family  of  all  sets  which  are  not  too 
big  and  whose  complement  has  symmetric  influence  of  at  most  t.  (Notice  that  with  high 
probability,  the  union  of  any  k  sets  in  the  partition  would  have  size  smaller  than  5 kn/r. 
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and  therefore  we  assume  this  is  the  case  from  this  point  on.)  Our  first  observation  is  that 
for  small  enough  values  of  t,  X)  is  a  {k  +  1) -intersecting  family.  Indeed,  for  any  sets 

K  e  F , 

Symlnf f(J  D  K)  =  Symlnf  j(J  U  K) 

<  Symlnfj(J)  +  Symlnf f(K)  +  cy/bkjr  <  2|  +  |  <  e. 

Since  /  is  e-far  from  (n  —  A;)- symmetric,  every  set  S  C  [n]  of  size  \S\  <  k  satisfies 
Symlnf f(S)  >  e.  So  |  J  n  K\  >  k. 

We  consider  two  cases  separately:  when  F±  contains  a  set  of  size  less  than  2k:  and 
when  it  does  not.  The  first  case  is  identical  to  the  proof  of  Lemma  5.4  and  hence  we  do 
not  elaborate  on  it. 

In  the  second  case,  which  also  resembles  the  proof  of  Lemma  5.4,  we  claim  that  Ft  is 
a  2/i  -intcrsccting  family.  If  this  was  not  the  case,  we  could  find  sets  J,  K  e  XT  such  that 
\JDK\  <  2k  and  Symlnf  f(TnK)  <  Symlnf /(J)+ Symlnf /(K)  +  |  <  §,  contradicting 
our  assumption. 

Let  J  C  [n]  be  the  union  of  k  parts  in  X.  Since  X  is  a  random  partition,  J  is  a  random 
subset  obtained  by  including  each  element  of  [n]  in  J  independently  with  probability  p  = 
r  <  9/777-  bound  the  probability  that  J  contains  some  element  from  XT,  we  define 
XT  to  be  all  the  sets  that  contain  a  member  from  X" |.  Since  T'L  is  also  a  2/c-intersecting 

family,  by  Theorem  4.10,  for  every  such  J  of  size  at  most  5 kn/r,  Pr[SymInf j(X)  < 
|]  =  Pr[X  G  J7*]  <  Hk/r(F'e)  <  (^)2k ■  Applying  the  union  bound  over  all  possible 
choices  for  k  parts,  /  will  not  satisfy  the  condition  of  the  lemma  with  probability  at  most 

0  (fP  =  0(k-p  a 

6.3.4  Proof  of  Theorem  6.1 

We  can  now  complete  the  proof  that  partial  symmetry  is  efficiently  testable. 

Theorem  6.1  (Restated).  Fix  n  >  0  and  0  <  k  <  n.  The  property  of  being  (n  —  k  )- 
symmetric  is  e-testable  with  0(k  log  k  +  k/e)  queries. 

Proof.  We  consider  different  cases  depending  on  the  value  of  k.  When  k  —  0,  the  problem 
of  testing  (n  —  k) -symmetry  is  simply  the  problem  of  testing  symmetry.  This  can  be  done 
with  O  (1/e)  queries  by  calling  AsymmetryTest(/,  [n] )  a  number  of  times  and  accepting 
iff  every  call  to  the  function  rejects. 
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Consider  now  the  case  where  1  <  k  —  °(ey  logn)-  show  that  in  this  case  the  Par- 

TIALS  YMMETRyTest  is  a  valid  tester  for  partial  symmetry  that  satisfies  the  requirements 
of  the  theorem  statement.  Note  first  that  s  =  @(/c2/e2)  <  12™  —  3. 

We  claim  that  the  failure  event  in  Step  3  of  PartialS  ymmetryTest  occurs  with 
probability  at  most  For  any  i  =  1, . . . ,  s  —  1,  the  expected  size  of  /*  is  E  |/,|  =  ^  so, 
by  Chernoff’s  bound, 

Pr  [lJ*l  >  lifd  <e"^). 

Similarly,  E  \IS\  =  ^  and  Chernoff’s  bound  implies  that 

Pr  [|/s|  <  3^]  <  e  ®(-+3). 


The  inequality  \It\  >  || /, |  can  only  hold  when  \It\  >  for  some  i  e  [s  —  1]  or  when 
\IS\  <  3 ^ .  By  the  union  bound  and  our  bound  on  s,  the  probability  that  this  event  occurs 
is  bounded  above  by 


l)e” 


12(3+3)  _|_  g  8(s+3)  <  ge  12(3+3)  < 


3—  log  n  _  _ 1 


—  12  logn 


12 logn  ^  12 


for  any  n  >  2. 

When  /  is  (n  —  k) -symmetric,  the  probability  that  Is  contains  an  asymmetric  vari¬ 
able  is  at  most  k-^  =  @(e2//c)  <C  When  Is  contains  no  asymmetric  variable,  then 
Lemma  6.9  guarantees  that  every  part  returned  by  FindAsymmetricPart  contains  an 
asymmetric  variable.  There  are  at  most  k  such  parts,  so  this  means  that,  conditioned  on  Is 
containing  no  asymmetric  variable  and  the  condition  in  Step  3  satisfied,  /  will  always  be 
accepted.  In  other  words,  every  (n  —  A') -symmetric  function  is  accepted  with  probability 
at  least  1  —  2 ^  =  |  >  |. 

When  /  is  e-far  from  (n  —  A;) -symmetric,  Lemma  6.10  guarantees  that  with  probability 
at  least  |  over  the  choice  of  the  random  partition,  Synilnf  j(  J)  >  |  will  hold  for  any  set  J 
obtained  by  taking  the  union  of  at  most  k  parts.  Then  Lemma  6.7  and  Chemoff’s  bounds 
proves  that  with  probability  at  least  |,  the  algorithm  will  identify  at  least  k  +  1  parts  with 
asymmetric  variables.  Therefore,  the  algorithm  correctly  rejects  with  probability  at  least 

1 1 _ 2_i_U^2 

1  12  9  1  36  ^  3’ 

The  above  argument  shows  that  PartialS  ymmetryTest  is  indeed  a  valid  e-tester 
for  (n  —  /c)-symmetry,  as  claimed.  To  complete  the  proof  of  the  theorem  for  this  range 
of  values  of  k,  it  suffices  to  analyze  the  algorithm’s  query  complexity.  By  inspection, 
we  see  immediately  that  it  makes  at  most  0(r  +  /clogs)  =  0(k/e  +  k  log(/c2/e2))  = 
0(k/e  +  k  log  k),  as  required  by  the  theorem  statement. 
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Finally,  we  now  consider  the  case  where  f2(e^/  \^)  <  k  <  n.  In  this  case,  consider  a 

simplification  of  the  PartialS  ymmetryTest  in  which  we  replace  the  random  partition 
with  a  trivial  partition  into  n  —  3  parts  with  4  coordinates  from  [n]  chosen  at  random  and 
put  into  the  part  /„_3  and  all  the  other  remaining  parts  containing  a  single  element  from 
[n] .  We  can  then  apply  the  same  strategy  as  in  the  general  algorithm  where  we  run  the 
AsymmetricTest  to  find  witnesses  of  asymmetry  then  run  FindAs  ymmetricPart  to 
identify  the  parts  that  contain  asymmetric  variables. 

The  analysis  of  correctness  of  the  simplified  algorithm  is  essentially  identical  to  the 
one  above.  □ 


6.4  Notes  and  Discussion 

History  of  partially  symmetric  functions.  The  class  of  partially  symmetric  functions 
was  first  considered  by  Shannon  [92]  in  his  research  on  the  circuit  complexity  of  boolean 
functions.  He  showed  that  while  most  functions  /  :  {0,  l}n  — >  {0, 1}  have  circuit  com¬ 
plexity  exponential  in  n,  the  circuit  complexity  of  partially  symmetric  functions  is  much 
smaller. 

Shannon’s  result  generated  much  further  research  into  the  interplay  between  the  partial 
symmetry  of  functions  and  circuit  complexity  [36,  10]  and  the  problem  of  determining 
when  a  function  is  partially  symmetric  (see,  e.g.,  [43]  and  references  therein). 

Other  definitions  of  symmetry  have  been  considered  in  the  computational  complexity 
community.  Clote  and  Kranakis  [42]  showed  that  all  functions  that  obey  a  large  amount 
of  symmetry  in  the  sense  that  the  isomorphism  class  of  those  functions  contain  at  most 
poly(n)  distinct  functions  are  in  NCi.  This  result  was  further  generalized  by  Babai, 
Beals,  and  Takacsi-Nagy  [12]  and  has  been  used,  e.g.,  in  proof  complexity  [85]. 

Functions  that  are  partially  symmetric  under  our  definition  also  satisfy  the  Clote- 
Kranakis  definition  of  symmetry.  While  this  does  not  immediately  imply  testability  of 
Clote-Kranakakis  symmetry  (since,  obviously,  an  efficient  test  for  a  property  V  does  not 
imply  the  existence  of  efficient  testers  for  properties  that  contain  V),  Chakraborty,  Fischer, 
Garcia  Soriano,  and  Matsliah  recently  showed  that  the  two  definitions  are  in  fact  equiva¬ 
lent  [37].  Thus,  the  theorem  in  this  chapter  shows  that  Clote-Kranakakis  symmetry  is  also 
efficiently  testable. 


Concurrent  work.  Chakraborty,  Fischer,  Garcia  Soriano,  and  Matsliah  [37],  in  inde¬ 
pendent  and  simultaneous  research,  obtained  a  different  proof  that  partial  symmetry  can 
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be  tested  efficiently.  Interestingly,  their  result  is  obtained  by  a  significantly  different  ap¬ 
proach:  instead  of  generalizing  the  JuntaTest  algorithm,  they  identify  a  clever  reduction 
between  testing  partial  symmetry  and  testing  juntas  to  show  that  the  JuntaTest  algo¬ 
rithm  can  be  used  as-is — along  with  a  separate  algorithm  to  construct  a  function  g  that  is 
a  junta  iff  /  is  partially  symmetric — to  also  test  partial  symmetry.  We  refer  the  reader  to 
their  paper  [37]  for  the  details. 


Influence  on  Schreier  graphs.  O’Donnell  and  Wimmer  [82,  83]  introduced  a  general¬ 
ized  notion  of  influence  for  functions  defined  over  Schreier  graphs.  In  this  paragraph,  we 
discuss  a  connection  between  this  generalized  notion  of  influence  and  symmetric  influ¬ 
ence.  This  connection  was  first  pointed  out  to  us  by  Dvir  Falik. 

For  any  set  U  C  Sn  of  permutations  that  is  (a)  closed  under  inverses,  and  (b)  generates 
Sn,  we  can  define  the  Schreier  graph  Sch({0, 1}",  S„  ,  U )  to  be  the  graph  obtained  by 
associating  one  vertex  for  each  element  of  {0,  l}n  and  adding  an  edge  between  x,  y  e 
{0,  l}n  if  there  exists  a  permutation  n  e  U  such  that  y  =  nx.  The  Schreier  influence  of 
the  permutation  7r  e  U  in  the  function  /  :  {0,  l}n  — »  {0, 1}  is 

InffV)  =  |Pr [f(x)  f(vx)]. 

J  *  X 

Fix  U  =  {ritj}i^j  to  be  the  set  of  transpositions  on  [n] .  Then  for  any  pair  of  distinct 
elements  i,jE  [n],  we  have 

Inf =  lFrlf(x)  ^  f(Ti,jx)} 

J  X 

=  Pr  [f(x)  f(nx)  |  V£  G  [n]  \  n(£)  =  t\  =  Symlnf/({i,  j}). 

Thus,  for  sets  of  size  2,  symmetric  influence  is  a  special  case  of  Schreier  influence. 

More  generally,  the  symmetric  influence  of  larger  sets  of  variables  corresponds  to  the 
average  Schreier  influence  of  sets  of  permutations  on  the  same  function.  Specifically, 
letting  U  be  the  set  of  permutations  with  at  least  \J\  fixed  points,  we  have  that 

Symlnf  f(J)  =  E  [Inff^Tr)]. 

J  TrUi&J  1 

Invariance  in  property  testing.  Finally,  we  wish  to  mention  that  the  research  presented 
in  this  chapter  fits  into  the  general  effort  for  understanding  the  role  of  invariance  in  prop¬ 
erty  testing.  This  effort,  launched  by  Kaufman  and  Sudan  [72],  has  been  widely  successful 
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in  identifying  the  effect  of  different  invariances  of  algebraic  properties  of  functions  on  the 
number  of  queries  required  to  test  the  same  properties.  We  encourage  the  reader  to  consult 
the  survey  [95]  and  the  references  therein  for  more  details  on  this  line  of  research. 
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Chapter  7 

Testing  A-Linearity 


Linearity  testing  is  one  of  the  earliest  success  stories  in  property  testing.  Recall  that  the 
function  /  :  {0,  l}n  — >  {0, 1}  is  linear  if  it  returns  the  parity  of  a  subset  of  the  variables. 
As  we  saw  in  Section  2.6,  when  /  is  a  linear  function,  it  satisfies  the  identity 

f(x)®f(y)  =  f(x®y ) 

for  every  x,y  E  {0, 1}".  Blum,  Luby,  and  Rubinfeld  [31]  showed  that,  remarkably,  lin¬ 
earity  can  be  e-tested  with  only  0(1 /e)  queries  by  simply  verifying  that  the  above  identity 
holds  for  randomly  selected  pairs  x,  y.  Linearity  testing  has  since  been  studied  exten¬ 
sively  [16,  17,  14,  71]  and  its  query  complexity  is  well  understood. 

The  class  of  k-linear  functions — functions  that  return  the  parity  of  exactly  k  of  the 
input  variables — is  closely  related  to  the  class  of  all  linear  functions.  The  query  com¬ 
plexity  of  the  A  -l  i nearity  testing  task,  however,  remained  until  very  recently  much  less 
well-understood  than  that  of  linearity  testing. 


Previous  work.  Let  us  first  review  some  folklore  bounds  on  the  query  complexity  for 
testing  A-l inearity.  As  we  saw  in  Section  3.2,  any  proper  learning  algorithm  for  A- linear 
functions  with  query  complexity  q  yields  a  A  -l i nearity  tester  that  makes  q+0(  1/e)  queries. 
There  is  a  very  simple  n-query  learning  algorithm  for  A;-linear  functions:  query  the  values 
/(e1), /(e2), . . . , /(en).  When  /  is /c-linear, /(e*)  =  1  for  exactly  k  indices  i  E  [n\  and  the 
function  returns  the  parity  of  those  variables.  Thus,  by  Lemma  3.13,  we  can  test  /c-linearity 
with  n  +  0(  1/e)  queries  for  every  0  <  k  <  n. 

When  k  <  ,  a  similar  folkloric  argument  yields  an  even  better  bound  on  the  query 

complexity  for  testing  A  - linearity.  There  are  (")  distinct  A;-linear  functions,  so  by  Corol- 
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lary  3.14  we  can  test  fc-linearity  with  0(log  (”)  /e)  =  0(k  log (n)/e)  queries. 

There  are  a  few  values  of  k  for  which  we  have  significantly  better  bounds.  First,  when 
k  —  0  and  k  =  n,  then  the  function  is  either  constant  or  the  symmetric  parity  function; 
in  both  cases,  testing  A- linearity  reduces  to  testing  function  identity  and  can  be  done  with 
0(l/e)  queries.  When  k  —  1,  we  have  an  interesting  special  case:  the  class  of  1-linear 
functions  is  exactly  the  set  of  dictator  functions.  The  dictator  tests  that  require  0(l/e) 
queries  [15,  84]  imply  that  1-linearity  also  can  be  tested  with  the  same  query  complexity. 

The  first  improvements  on  the  folklore  bound  for  the  query  complexity  for  testing  k- 
linearity  for  general  values  of  k  were  obtained  by  Fischer  et  al.  [52].  One  of  their  results 
on  testing  function  isomorphism  that  we  will  discuss  in  more  detail  in  Chapter  8  implies 
that  A- linearity  can  be  tested  with  poly(A;,  e)  queries.  Clearly,  when  k  <C  n,  this  is  much 
more  efficient  than  the  folklore  test. 

The  first  non-trivial  lower  bound  on  the  query  complexity  for  testing  A  - linearity  was 
also  established  by  Fischer  et  al.  [52].  They  showed  that  when  k  =  o(y/n),  testing 
A;-linearity  non-adaptively  requires  il{\Jkj  log  k)  queries.  This  implies  a  general  lower 
bound  of  f)(log  A;)  queries  for  general  (i.e.,  adaptive)  A;-linearity  testers.  Their  motivation 
for  establishing  this  bound  was  not  directly  related  to  the  study  of  A;-linearity.  Rather,  they 
wanted  to  establish  a  lower  bound  for  the  query  complexity  of  the  junta  testing  problem. 
Since  A;-linear  functions  are  A; -juntas  and  ( k  +  2)-linear  functions  are  far  from  A; -juntas,  so 
to  prove  a  lower  bound  on  the  query  complexity  for  testing  A; -juntas,  it  suffices  to  establish 
a  corresponding  lower  bound  for  the  problem  of  distinguishing  A;-linear  and  ( k  +  2)-linear 
functions. 

In  Chapter  10,  we  will  derive  a  lower  bound  that  is  incomparable  to  the  lower  bound 
of  Fischer  et  al.  [52]:  Theorem  10.1  gives  a  weaker  bound  of  fi(log  log(min{A;,  n  —  A;})) 
queries  for  testing  A;-linearity,  but  that  lower  bound  applies  to  all  values  of  k.  We  will 
discuss  the  theorem  in  more  detail  in  that  chapter.  For  now,  let  us  simply  point  out  that  the 
original  motivation  for  this  result  was  again  not  the  study  of  A;-linearity  per  se,  but  rather 
to  understand  another  property  testing  problem — in  this  case,  the  function  isomorphism 
testing  problem. 

A  significant  improvement  on  the  best  lower  bound  for  the  query  complexity  of  testing 
A;-linearity  was  obtained  by  Goldreich  [57].  He  showed  that  Q  (ruin  {A;,  n  —  k})  queries 
are  required  to  test  A;-linearity  non-adaptively  and  that  adaptive  testers  for  A;-linearity  must 
make  at  least  ^(minlv^A;,  \Jn  —  A;})  queries.1  Once  again,  this  result  was  not  motivated  by 

'More  precisely,  Goldreich  considered  the  slightly  different  problem  of  testing  <  fc- linearity — testing 
whether  a  function  returns  the  parity  of  at  most  k  variables.  His  arguments,  however,  apply  equally  well  to 
the  A-linearity  testing  problem. 
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the  study  of  A-lincarity  itself.  Instead,  the  lower  bound  was  used  to  establish  lower  bounds 
on  the  number  of  queries  required  to  test  properties  computable  by  width-2  ordered  binary 
decision  diagrams  (OBDDs).2 

The  folklore  tester  described  earlier  in  this  section  shows  that  Goldreich’s  lower  bound 
for  the  query  complexity  of  non-adaptive  A-lincarity  testers  is  asymptotically  optimal  when 
A;  ~  The  major  question  that  was  left  open  was  whether  the  lower  bound  for  adaptive 
testers  of  A;-linearity  could  also  be  improved  to  match  the  trivial  query  complexity  of  the 
folklore  tester  in  the  same  range.  Indeed,  Goldreich  conjectured  that  it  could — specifically, 
that  testing  | -linearity  requires  Q(n)  queries. 


Our  results.  We  confirm  Goldreich’s  conjecture  and  show  that  testing  A;-linearity  re¬ 
quires  f2(min{A;,  n  —  k})  queries.  In  fact,  we  show  more.  Instead  of  simply  giving  a  lower 
bound  that  is  asymptotically  equivalent  to  ruin  {A;,  n  —  k},  we  show  a  lower  bound  that 
nearly  matches  this  amount. 

Theorem  7.1.  For  any  0  <  k  <  n  and  any  0  <  e  <  ^,  e-testing  k-linearity  requires  at 
least  min  {A;,  n  —  k}  ■  (1  —  o(l))  queries. 

In  particular,  for  k  =  |,  the  theorem  states  that  no  tester  for  A;-linearity  can  improve 
on  the  query  complexity  of  the  folklore  testing  algorithm  by  more  than  a  factor  of  2. 

As  we  have  seen  in  the  overview  of  prior  work,  lower  bounds  on  the  query  complexity 
for  testing  A;-linearity  (or  for  related  problems)  yield  lower  bounds  for  the  query  complex¬ 
ity  of  other  property  testing  problems  as  well.  As  we  show  in  Section  7.3,  the  lower  bound 
on  testing  A;-linearity  in  Theorem  7.1  indeed  gives  improvements  on  the  best-known  lower 
bounds  for  many  other  property  testing  problems. 

The  proof  of  Theorem  7.1  proceeds  by  reducing  the  problem  of  testing  A;-linearity  to  a 
purely  geometric  problem  on  the  boolean  hypercube.  We  describe  that  geometric  problem 
and  solve  it  in  Section  7.1.  We  then  complete  the  proof  of  the  theorem  in  Section  7.2. 


7.1  Affine  subspaces  and  layers  of  the  hypercube 

We  reduce  the  problem  of  testing  A;-linear  functions  to  a  purely  geometric  problem  on  the 
Hamming  cube. 

2See  Section  7.3.7  for  more  details. 
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Namely,  we  obtain  our  testing  lower  bound  by  showing  that  affine  subspaces  of  large 
dimension  intersect  roughly  the  same  fraction  of  the  middle  layers  of  the  cube.  More 
precisely,  let  144  C  {0,  l}n  denote  the  set  of  vectors  x  G  {0,  l}n  of  Hamming  weight  k. 
Our  main  technical  contribution  is  the  following  result. 

Lemma  7.2.  There  is  a  constant  c  >  0  such  that  for  any  affine  subspace  V  C  {0, 1}"  of 
dimension  d  >  |  +  cn2//3, 

|yrwn_i|  \vn  w^+1 


< 


Proof  For  any  set  A  C  {0,  l}n,  define  I  \  :  {0,  l}n  — >  {0, 1}  to  be  the  indicator  function 
for  A.  For  a  given  function  /  :  {0,  l}n  — »  {0, 1},  let  us  write  E[/]  as  shorthand  for 
E x[f(x)]  where  the  expectation  is  over  the  uniform  distribution  of  x  G  {0,  l}n.  Similarly, 
for  two  functions  /,  g,  we  write  E [/  •  g\  as  short-hand  for  E X[f(x)  ■  g(x)]. 

For  any  subsets  A,  B  C  {0,  l}n,  \AnB\  —  2n  ■  E[/4  •  IB],  Since  |W«_i|  =  |W4+i|  = 
(*->)• 

\vnWR_f  \vnw»+1 

\Wn+[\ 


(S-i) 


E  [iy  ■  (. IWn  _! 


IWr, 


+  1 


The  subspace  V  can  be  defined  by  a  set  S'  C  [n]  of  size  \S\  —  d  and  an  affine-linear 
function  /  :  {0,  l}n_d  — >•  {0,  l}d,  where  x  G  V  iff  xs  =  Define  7;^  and  7^  to  be 

indicator  functions  for  |xg  |  =  m  and  xg  —  m,  respectively.  Then 

d 

E[7y  ■  (k_!  -  Iwn+1)]  =  y  E 

?n=0 

Let  U  C  {0,  l}5  be  the  image  of  /.  Let  7'  =  dim(C7).  Define  hm  :  {0,  l}5  — >  [—1, 1] 
by  setting  hm{u)  =  Es6{0)1}s[7y(a;,M)  •  (7|_m_1(x)  -  7|_m+1(x))].  Note  that  hm  = 

/*  (ii -m-i  —  In-m+ij  ■  Notice  also  that  hrn  is  supported  on  U.  We  have 

d  d 

E[IV  ■  ( Iwn _L  -  Iwn+1)\  =  ^  E  [7^  •  hm\  =  ^2  ®  | Jm  '  •  hm\  •  (7.1) 

m= 0  m= 0 

Two  applications  of  the  Cauchy-Schwarz  inequality  yield 


Iv  "  I-m,  (1%-m-l  ~  1%-m+l) 


d 


d 


Y  E  K  ■  v  ■  hm]  <  Y  < 

m= 0  m= 0 


d 


d 


T  II Im  •  111'  A 

m= 0  \ 


y  II  Ml 

m= 0 


(7.2) 
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We  now  bound  the  two  terms  on  the  right-hand  side.  For  the  first  term,  we  have 


y,  Pm  •  if/lll 

m= 0 


X  E[/®(i)2  •  1„]  =  E 


iv  41M2 

m 


2^d'—d 


(7.3) 


where  the  last  equality  follows  from  the  fact  that  for  every  x  G  (0,  l}n,  there  is  exactly 
one  m  for  which  I^(x)  =  1. 

We  now  examine  the  second  term.  By  Parse val’s  Identity,  ||/im|||  =  Xlae{o  i}s  hm{Xa)2- 
Suppose  that  the  image  of  /  has  dimension  d!  <  d.  Then,  since  hm  is  a  pushforward, 


hmix)  =  2  d  (t|— m— 1  (X  °  /)  -  7f -m+l(x  °  /))  ■ 


The  characters  x°  f  depend  only  on  the  restriction  of  y  to  /({ 0, 1}  s).  Thus  these  charac¬ 
ters  all  lie  in  some  subspace  W  C  {0,  l}*5  of  dimension  d',  with  each  character  appearing 
2d~d  times.  Thus,  we  have  that 

II Mil  =  2~d~d‘  J2  -  t|_m+i(x))2- 

xew 


For  any  set  y  C  S,  we  can  apply  Facts  4.13  and  4.12(i)  to  obtain 

tLm+i(x)  -  J|-™-,(x)  =  2-("-d)^|:t+i(ixl  + 1). 


Therefore,  El=o  IIMI  =  2  2"+d  ''  E„,  Ex€h/  +  l)2  and  by  Fact 4. 12(ii), 

£  HMl  £  2-2ndd-d'  V  (-!)«+■  tf^?1)(2|x|  +  2)  (7.4) 


ra=0 


xew 


There  exist  some  d!  coordinates  such  that  the  projection  of  W  onto  those  coordinates 
is  surjective.  Therefore  the  number  of  elements  of  W  with  weight  at  most  i  is  at  most 
Sj=i  (t)  •  We  also  have  a  similar  bound  on  the  number  of  elements  of  W  of  size  at 
least  n  —  d  —  £.  Therefore,  since  by  Fact  4.12(v)  the  summand  in  (7.4)  is  decreasing  in 
min(|y|,  n  —  d  —  |y|),  we  have 


d 

E 

m= 0 


II  h. 


m  ||2  — 


<  2 


— 2n-\-d—  d'-\-l 


d! 

E 

3=0 
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By  Fact  4.12(iii),  the  sum  on  the  right-hand  side  evaluates  to  — K2(fdfld,^V)  (2) .  We  can 
then  apply  the  generating  function  representation  of  Krawtchouk  polynomials  to  obtain 


d 


E iimii  < 

m= 0 


_2_  2n+d-\-d' +1  tn-d—d'-\-l 


](i 


X] 


J(l  +  x) 


2  (n—d—d') 


=  2-2n+d+d'+2  f  f2  (n-d-  d')\  _  /  2  (n-d-  d!)  \  \ 
\  \  n  —  d  —  d'  )  \n  —  d  —  dl  —  l)  ) 
=  2~d~d'0(n  -d-  d')~3/2  =  2-d~d'0  ((n  -  2 d)~3/2)  . 


Thus  we  have  that 

E  [Iv  •  {IWn+1  -  IwnJ }  <  V¥z~d^2-d~d'0^n~2d)-^)  =  2  ~dO  ((n  -  2d)-3/4)  . 

When  d  —  |  —  cn2^3  for  some  large  enough  constant  c  >  0,  we  therefore  have  E  |  ly  ■ 
(. I\Vn+1  -  (nni1)2~n~d  and  the  lemma  follows.  □ 


7.2  Proof  of  Theorem  7.1 


We  are  now  ready  to  complete  the  proof  of  Theorem  7.1  by  using  the  lemma  from  the  last 
section. 

Theorem  7.1  (Restated).  For  any  0  <  k  <  n  and  any  0  <  e  <  |,  e-testing  k-linearity 
requires  at  least  minjfc,  n  —  k}  ■  (1  —  o(l))  queries. 


Proof.  We  first  prove  the  special  case  where  k  —  |  —  1.  There  is  a  natural  bijection  be¬ 
tween  linear  functions  {0, 1}"  — >  {0, 1}  and  vectors  in  {0,  l}n:  associate  f(x)  =  Xi 
with  the  vector  a  G  {0, 1}"  whose  coordinates  satisfy  a%  —  l[i  <G  S).  Note  that  f(x)  = 
a  ■  x. 

For  0  <  i  <  n,  let  Wt  C  {0,  l}n  denote  the  set  of  elements  of  Hamming  weight  t.  Fix 
any  set  X  C  {0,  l}n  of  q  <  |  —  0(n 2//3)  queries  and  any  response  vector  r  G  {0,  l}q.  The 
set  of  linear  functions  that  return  the  response  vector  r  to  the  queries  in  X  corresponds  in 
our  bijection  to  an  affine  subspace  V  C  {0,  l}n  of  dimension  n  —  q.  This  is  because  for 
each  x  G  X,  the  requirement  that  f(x)  =  rl  imposes  an  affine  linear  relation  on  /.  By 
Lemma  7.2,  this  subspace  satisfies  the  inequality 


ivnWn^i  \vnWn+1 

w*i\  Wii\ 


(7.5) 
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Define  Dyes  and  Dno  to  be  the  uniform  distributions  over  (|  —  l)-linear  and  (l  +  i)- 
linear  functions,  respectively.  By  our  bijection,  Dyes  and  Dno  correspond  to  the  uniform 
distributions  over  W«_ i  and  Wn+1.  As  a  result,  the  probability  that  a  function  drawn  from 
Dyes  or  from  Dno  returns  the  response  r  to  the  set  of  queries  X  is 


Pr  [/(A')  =  r]  = 

/  ~^yes 


\vn  Wn_j 


and  Pr  [f(X)=r] 

J^Pno 


\v  n  Wn+1 

\Wn+1\ 


So  (7.5)  and  Lemma  3.15  imply  that  at  least  |  —  0(n2/3)  queries  are  required  to  distinguish 
(|  —  l)-linear  and  (|  +  l)-linear  functions.  All  (|  +  l)-linear  functions  are  ^-far  from 
(|  —  l)-linear  functions,  so  this  completes  the  proof  of  the  theorem  for  k  —  |  —  1. 

For  other  values  of  k,  we  apply  a  simple  padding  argument.  When  k  <  |  —  1,  modify 
Dyes  and  Duo  to  be  uniform  distributions  over  A; -linear  and  (k  +  2)-linear  functions,  re¬ 
spectively,  under  the  restriction  that  all  coordinates  in  the  sum  taken  from  the  set  [2k  +  2] . 

2 

This  modification  with  k  —  |  —  2  shows  that  |  —  0(n 3 )  queries  are  required  to  distinguish 
(|  —  2)-  and  | -linear  functions;  this  implies  the  lower  bound  in  the  theorem  for  the  case 

k=  f.  □ 


7.3  Implications 

By  examining  the  proof  of  Theorem  7.1,  we  see  that  the  proof  actually  establishes  a 
stronger  result:  even  if  we  are  promised  that  the  input  function  is  either  a  k- linear  function 
or  a  (k  +  2)-linear  function,  we  still  need  min  (A:,  n  —  k)  ■  ( 1  —  o(l))  queries  to  distinguish 
between  the  two  cases.  In  other  words,  we  proved  the  following  lemma. 

Lemma  7.3.  For  any  0  <  k  <  n  and  any  0  <  e  <  \,  at  least  min  (A:,  n  —  A:}  •  (1  —  o(l)) 
queries  are  required  to  distinguish  k-linear  and  ( k  +  2) -linear  functions  with  probability 
at  least  |. 


This  lemma  is  particularly  helpful  for  establishing  other  lower  bounds  in  property  test¬ 
ing.  Given  a  property  V,  if  we  can  identify  some  0  <  k  <  n  such  that  A;-linear  functions 
are  contained  in  V  and  [k  +  2) -linear  functions  are  e-far  from  V,  then  we  immediately 
obtain  a  lower  bound  of  min  (A;,  n  —  k)  ■  (1  —  o(l))  queries  on  the  query  complexity  for 
e-testing  V.  In  the  rest  of  this  section,  we  apply  this  method  to  a  number  of  different 
properties.  A  summary  of  the  results  obtained  in  this  section  is  presented  in  Table  7.1. 
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Property 

Lower  bound 

Previous  lower  bound 

Upper  bounds 

A-lincarity 

k  —  o(k) 

n(Vk)  [57] 

fi(Jfe)  (n.a.)  [57] 

0(k  log  k)  [40] 

0(n )  (folklore) 

<  A-lincarity 

k  —  o(k ) 

Q(k/\ogk)  [57,38] 
fi(fc)  (n.a.)  [57] 

0(k  log  k)  [40] 

0(n)  (folklore) 

fc-juntas 

k  —  o(k) 

n(k)  [41] 

0(k  log  k)  [20] 

Fourier  degree  <  d 

d  —  o{d) 

n(d)  [40] 

2°(d)  [46. 40] 

6-sparsc  polynomials 

s  —  o(s) 

[39] 

O(s)  [39] 

size-.s  branching  programs 

s  -  (s) 

[39] 

O(s)  [39] 

size-.s  decision  trees 

log  s  —  o(log  s) 

fi(logs)  [39] 

O(s)  [39] 

Table  7.1:  Results  implied  by  Lemma  7.3.  All  bounds  are  stated  under  the  assumption  that  k,  d, 
and  s  arc  at  most  |.  Bold  font  indicates  an  asymptotic  improvement  over  the  previous  bounds. 
Bounds  labeled  with  (n.a.)  apply  only  to  non-adaptive  testers. 

7.3.1  <  &>linearity 

We  begin  with  an  easy  corollary.  The  function  /  :  {0,  l}n  — *  {0, 1}  is  <  k-linear  iff  it  is 
A  linear  for  some  i  <  k.  Lemma  7.3  gives  a  strong  lower  bound  for  the  query  complexity 
of  testing  this  property. 

Corollary  7.4.  For  any  0  <  k  <  n  and  any  0  <  e  <  |,  e-testing  <  k-linearity  requires  at 
least  minjfc,  n  —  k}  ■  (1  —  o(l))  queries. 

Proof.  By  definition,  k- linear  functions  are  also  <  A-lincar.  Proposition  2.24  implies  that 
(k  +  2)-linear  functions  are  3-far  from  every  <  A-lincar  function.  Therefore,  any  <  k- 
linearity  tester  must  be  able  to  distinguish  A;-linear  and  ( k  +  2)-linear  functions  with  prob¬ 
ability  at  least  |  and  the  corollary  follows  from  Lemma  7.3.  □ 

7.3.2  Fourier  degree 

Recall  that  the  function  /  :  {0,  l}n  — >  {0, 1}  has  Fourier  degree  at  most  d  if  f(a )  =  0  for 
every  element  a  €  {0,  l}n  of  Hamming  weight  ||a||  >  d. 

Corollary  7.5.  For  any  0  <  k  <  n  and  any  0  <  e  <  ^,  e-testing  f  :  {0,  l}n  — >  {0, 1 }  for 
the  property  of  having  Fourier  degree  at  most  d  requires  at  least  min{A;,  n  —  k}  ■  (1  —  o(l)) 
queries. 
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Proof.  By  Proposition  2.23,  A -linear  functions  have  Fourier  degree  A  and  by  Proposi¬ 
tion  2.24,  (A  +  2)-linear  functions  are  {-far  from  all  functions  with  Fourier  degree  at  most 
k.  So  the  corollary  again  follows  directly  from  Lemma  7.3.  □ 


7.3.3  Juntas 

Another  easy  corollary  of  Lemma  7.3  gives  a  good  lower  bound  on  the  query  complexity 
for  testing  juntas. 

Corollary  7.6.  For  any  0  <  k  <  n  and  any  0  <  e  <  e-testing  k-juntas  requires  at  least 
min  {A',  n  —  k}  ■  (1  —  o(l))  queries. 

Proof.  When  /  :  {0,l}n  — >  {0,1}  is  A-linear,  it  is  clearly  also  a  A -junta.  When  /  is 
(A  +  2)-linear,  then  Proposition  2.24  implies  that  /  is  {-far  from  all  A-junta  functions.  The 
corollary  follows  immediately  from  Lemma  7.3.  □ 


7.3.4  Sparse  polynomials 

The  function  /  :  {0, 1}"  — >  {0, 1}  is  an  s-sparse  polynomial  if  the  corresponding  function 
/f2  :  F£  — *  F2  is  a  polynomial  with  at  most  s  monomials.  We  can  again  use  Lemma  7.3 
to  prove  a  lower  bound  on  the  query  complexity  for  testing  s-sparse  polynomials.  In  order 
to  do  so,  however,  we  need  to  show  that  (s  +  2) -linear  functions  are  far  from  all  s-sparse 
polynomials.  This  was  first  done  by  Diakonikolas  et  al.  [46,  Thm.  36]. 

Lemma  7.7  (Diakonikolas  et  al.  [46]).  Fix  0  <  A  <  n  —  2.  Let  f  :  {0,  l}n  — >  {0, 1}  be 
an  (A  +  2) -linear function.  Then  f  is  —far from  every  k-sparse  polynomial. 

Proof.  Without  loss  of  generality,  let  /  :  x  (->•  x\  ©  •  •  •  ©  xk+2.  Let  g  be  an  A-sparse 
polynomial,  i.e.  g  —  Tj  ©  •  •  •  ©  Tk  where  each  Tt  is  a  monomial.  We  want  to  show  that 
/  and  g  are  far.  We  can  assume  without  loss  of  generality  that  g  does  not  contain  any 
length-1  terms,  since  if  it  did  we  could  just  subtract  those  terms  off  of  both  /  and  g  to 
create  f  and  g',  which  have  the  same  distance  from  each  other.  We  could  then  prove  the 
theorem  for  /',  g' .  and  a  smaller  value  of  A. 

Define  the  influence  of  a  variable  x,  in  /,  denoted  Inf ff),  in  the  standard  way-  i.e. 
Infj(/)  =  Pr X[f(x)  f  f(x ®1)]  where  x denotes  x  with  the  7th  bit  flipped.  Define  the 
total  influence  of  /  to  be  i  Inf ,(/). 
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For  any  /  and  g,  it  is  straightforward  to  show  that  if  for  some  i  the  difference  |Inf  ,  (/)  — 
Infj(^)|  is  at  least  5,  then  /  and  g  must  have  distance  at  least  5/2.  When  /  is  the  (k  +  2)- 
linear  function  defined  above,  each  variable  x\  through  Xk+ 2  has  influence  1.  Thus,  to 
complete  the  proof,  we  will  show  that  in  g  one  of  these  variables  must  have  influence  at 
most  0.9. 

If  the  total  influence  of  X\  through  Xk+ 2  in  g  is  less  than  0.9 (k  +  2),  then  we  are  done, 
since  the  pigeonhole  principle  implies  the  existence  of  a  variable  Xi  with  influence  at  most 
0.9.  Thus,  in  what  follows,  we  assume 

k+2 

Y/lDii(g)>0.9(k  +  2).  (7.6) 

i 

We  can  bound  the  total  influence  of  x\  through  Xk+2  in  g  as  follows.  First,  we  write 
(j  —  (]2  0  g3  where  g2  is  the  collection  of  terms  in  g  that  have  length  2,  and  g3  is  the 
collection  of  terms  in  g  that  have  length  at  least  3.  Now  note: 

•  Each  variable  x%  that  appears  in  g2  has  Inf;(<?2)  =  1/2.  The  reason  is  because  since 
every  term  of  g2  has  length  2,  Xi  is  influential  exactly  when  the  other  variables  it 
appears  with  have  parity  1,  which  happens  exactly  half  the  time. 

•  For  each  term  in  g3,  the  total  contribution  of  that  term  to  the  influences  of  all  the 
variables  is  at  most  3/4.  To  see  why,  suppose  the  term  has  length  m,  then  on  a 
random  assignment  the  probability  that  a  variable  is  relevant  to  that  term  is  ^rr,  so 
the  total  effect  the  term  can  have  on  all  the  influences  is  at  most  m  ■  ^+=t-  If  m  >  3, 
this  is  at  most  3/4. 

Let  R2  be  the  number  of  terms  of  g2,  and  R3  be  the  number  of  terms  in  g3.  By  hypoth¬ 
esis,  R2  +  R-3  <  k.  Since  each  term  of  g2  contributes  at  most  1  to  the  total  influence  of  g, 
and  each  term  of  g3  contributes  at  most  3/4  to  the  total  influence  of  g,  we  have  that 

k+2 

^Inf,^)<i?2  +  (3/4)i?3.  (7.7) 

i 

Combining  equations  7.6  and  7.7  we  get  that  R2  +  (3/4 )R3  >  (9/10 )k.  Using  the  fact  that 
R2  +  R3  <  k-  this  implies  that  R3  <  (4/10)/c,  in  other  words  there  cannot  be  too  many 
terms  of  length  3  or  more  in  g.  Now  we  can  bound  the  influence  of  variables  x3  through 
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xk+2  m  g. 


k+2  k+2 

Y2  -  £«(<&)  +  Infi(^)] 

i  i 

k+2  n 

<  Y2  Inf'te)  +  Y2  Infi(^) 

i  i 

<  -(k  +  2)  +  -■  R3 

-l(k  +  2)  +  rYo'k 

<  0.9(k  +  2)  . 

By  the  pigeonhole  principle,  there  must  exist  a  variable  Xi  with  influence  at  most  0.9  in 

9 •  □ 

Theorem  7.8.  For  any  0  <  s  <  n  and  any  0  <  e  <  e-testing  s-sparse  polynomials 
requires  at  least  min{s,  n  —  s}  •  (1  —  o(l))  queries. 

Proof.  When  /  :  {0,  l}n  — >■  {0, 1}  is  s-linear,  it  is  an  s-sparse  polynomial.  When  /  is 
(s  +  2)-linear,  then  Lemma  7.7  implies  that  /  is  ^--far  from  all  s-sparse  polynomials.  The 
corollary  follows  immediately  from  Lemma  7.3.  □ 

7.3.5  Decision  trees 

We  can  also  use  Lemma  7.3  to  prove  lower  bounds  for  properties  related  to  the  computa¬ 
tional  complexity  of  a  boolean  function.  We  first  show  how  it  implies  a  lower  bound  for 
testing  whether  a  function  can  be  computed  by  a  small  decision  tree. 

Lemma  7.9.  Fix  s  >  1  and  0  <  a  <  1.  Let  f  :  {0,  l}n  — >  {0, 1}  be  an  s-linear  function. 
Then  f  can  be  computed  by  a  decision  tree  of  size  2s  and  is  ++- far  from  all  functions  that 
are  computable  by  decision  trees  of  size  at  most  a  2s. 

Proof.  To  construct  a  decision  tree  of  size  2s  that  computes  the  function  /  :  x  (->•  © 

•  •  •  ©  xis,  create  a  complete  tree  of  depth  s  where  each  node  at  level  j  of  the  tree  queries 
x%:j .  This  tree  has  2s  leaves  and,  by  setting  the  value  of  each  leaf  appropriately,  computes 
the  function  /  exactly. 

Consider  now  a  decision  tree  T  of  size  at  most  a2s,  and  let  g  :  {0,  l}n  — >  {0, 1}  be 
the  function  computed  by  this  tree.  We  want  to  show  that  Pr|/(.x)  f  g(x)]  >  when 
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the  probability  is  over  the  uniform  distribution  of  x  G  {0,  l}n.  For  each  leaf  (i  of  T,  let 
depth(f)  denote  the  number  of  unique  variables  queried  by  the  nodes  in  the  path  from  the 
root  of  T  to  i  and  let  Rf  C  {0,  l}n  represent  the  set  of  inputs  x  G  {0,  l}n  that  define 
a  path  in  T  that  reaches  i.  (Note  that  the  sets  Rf  form  a  partition  of  (0,  l}n.)  Define 
B  :=  Uj?  ■  dePth(j?)<.s  t0  ^>e  the  union  of  the  sets  Rf  for  all  the  leaves  in  T  of  depth  strictly 
less  than  s.  Then 

Pr [f(x)  g(x)\  >  Pr[/(x)  f  g(x)  fl  x  G  B]  —  Pr[x  G  B)  ■  Pr[/(x)  ^  g(x)  j  x  G  B). 

For  any  leaf  t  of  T,  the  probability  that  an  input  x  chosen  uniformly  at  random  from 
{0,  l}n  reaches  i  is  2~depth(P.  By  the  union  bound,  the  probability  that  x  reaches  a  leaf  of 
depth  at  least  s  in  T  is  at  most  a  2s  ■  2~s  =  a,  so  Pr[a:  G  B]  >  1  —  a. 

Let  i  be  a  leaf  in  T  of  depth  at  most  s  —  1.  Then  there  is  some  index  i  G  (ii , . . . ,  } 

that  is  not  queried  in  the  path  from  the  root  of  T  to  i.  We  can  partition  Rf  into  pairs 
(x,x^)  where  each  pair  is  identical  in  all  but  the  / - 1 h  coordinate.  For  each  such  pair, 
f(x)  f(x W)  so  no  matter  what  label  is  attached  to  the  leaf  t,  we  have  Pr[/(x)  7^  g(x)  \ 
x  G  R(\  —  This  also  implies  that  Pr[/(x)  7^  g(x)  \  x  G  B\  =  \  and,  therefore, 
Pr[/(x)  7^  g(x)\  >  (1  —  a)  ■  \  —  ^rr,  as  we  wanted  to  show.  □ 

Theorem  7.10.  For  any  0  <  s  <  n  and  any  0  <  e  <  |,  e-testing  size- 2s  decision  trees 
requires  at  least  minjs,  n  —  s}  •  (1  —  o(l))  queries. 

Proof.  By  Lemma  7.9,  when  /  :  (0,  l}n  — *  (0, 1}  is  s-linear  it  can  be  computed  by  a 
decision  tree  of  size  2s  and  when  /  is  (s  +  2)-linear  it  is  §-far  from  all  decision  trees  of 
size  2s..  The  corollary  follows  immediately  from  Lemma  7.3.  □ 

7.3.6  Branching  programs 

We  show  that  Lemma  7.9  also  implies  a  nearly-optimal  lower  bound  on  the  query  com¬ 
plexity  for  testing  whether  a  function  can  be  computed  by  a  small-size  branching  program. 

Lemma  7.11.  Let  V  be  the  class  of  all  boolean  functions  computable  by  branching  pro¬ 
grams  of  size  2s.  Then  every  s-linear  function  is  in  V  while  every  (s  +  2) -linear  function 
is  far  from  V. 

Proof.  Again,  the  first  assertion  is  almost  immediate:  consider  a  branching  program  of 
width  2  that  queries  xlx  at  the  start  node  and  queries  xt]  on  both  nodes  at  level  1  <  j  <  s. 
We  can  arrange  the  edges  of  this  branching  program  so  that  the  left  (resp.,  right)  node  at 
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level  j  is  reached  when  X^^  CD  *  CD  X ij _j  equals  0  (resp.,  equals  1).  This  branching  program 
has  size  2s  —  1  and  computes  the  s-linear  function. 

For  the  second  assertion,  let  P  be  a  branching  program  of  size  2s,  and  suppose  it  is 
close  to  some  (s  +  2)-linear  function  h.  Note  that  if  one  of  the  s  +  2  variables  in  h  does  not 
appear  in  P,  then  h  and  P  are  {-far,  since  for  every  input  there  is  a  variable  whose  value 
we  can  flip  to  change  the  value  of  h  without  changing  the  output  of  P. 

Thus,  we  assume  that  every  variable  in  h  appears  in  P.  Moreover,  since  P  has  only 
2s  nodes,  there  must  be  at  least  two  variables  in  h  that  are  queried  only  once  in  P.  Let  x\ 
and  x-2  denote  two  such  variables,  and  let  u\  and  u2  denote  the  corresponding  nodes  in  P. 
The  graph  of  P  is  directed  and  acyclic,  so  we  can  assume  without  loss  of  generality  that 
no  path  reaches  the  node  u\  after  reaching  u2. 

Consider  the  paths  in  P  generated  by  strings  x,  x^  €  {0,  l}n,  where  x  is  generated 
uniformly  at  random  and  x(l>  is  generated  from  x  by  flipping  x\.  Note  that  x(l)  is  also 
uniform.  If  the  random  path  generated  by  x  reaches  u2  with  probability  less  than  {,  then 
with  probability  at  least  {,  flipping  the  value  of  x2  changes  the  value  of  h  without  changing 
the  output  of  P;  hence,  P  is  {-far  from  h.  On  the  other  hand,  if  this  random  path  reaches  u2 
with  probability  at  least  {,  then  the  path  generated  by  x(l^  also  reaches  u2  with  probability 
{.  By  the  union  bound,  the  probability  that  both  x  and  x(i  >  describe  paths  in  P  reaching 
u2  is  at  least  {.  But  since  u\  cannot  be  reached  after  u2,  this  means  that  both  x  and 
describe  paths  to  the  same  terminal  in  P  even  though  they  have  different  values  in  h. 
Therefore,  P  is  {-far  from  h  in  this  case  too.  □ 

O 

Theorem  7.12.  For  any  0  <  s  <  n  and  any  0  <  e  <  {,  e-testing  size-s  branching 
programs  requires  at  least  min  {.S',  n  —  .s  }  •  (1  —  o(l))  queries. 

Proof.  By  Lemma  7.11,  when  /  :  {0,  l}n  — *  {0, 1}  is  s-linear  it  can  be  computed  by  a 
branching  program  of  size  s  and  when  /  is  (.s  +  2)-linear  it  is  {-far  from  all  functions 
computable  by  branching  programs  of  size  s..  The  corollary  follows  immediately  from 
Lemma  7.3.  □ 

7.3.7  Small-width  OBDDs 

An  ordered  binary  decision  diagram  (or  OBDD)  is  an  acyclic  graph  with  a  single  root  and 
at  most  n  +  l  levels  of  nodes.  There  are  two  nodes  in  the  last  level.  These  are  called  sink 
nodes;  one  is  labeled  0  and  the  other  one  is  labeled  1.  The  nodes  in  every  other  level  have 
out-degree  two,  with  one  edge  labeled  0  and  the  other  one  labeled  1.  All  edges  leaving  a 
node  from  the  fth  level  go  to  a  node  in  the  (£  +  l)st  level.  Each  level  is  labeled  with  some 
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index  i  e  [n] .  The  first  level  of  any  OBDD  contains  a  single  node.  The  width  of  an  OBDD 
is  the  maximum  number  of  nodes  at  any  level. 

We  say  that  an  OBDD  computes  the  function  /  :  {0,  l}n  — >  {0, 1}  if  for  every  x  e 
{0,  l}n,  the  path  through  the  OBDD  defined  by  following  the  edge  labeled  xr  from  the 
current  node,  where  i  is  the  label  associated  with  the  current  node’s  level,  leads  to  the  sink 
node  labeled  with  f(x). 

There  are  two  natural  question  that  we  may  ask  related  to  property  testing  and  small- 
width  OBDDs — or,  for  that  matter,  any  other  complexity  class.  How  many  queries  do  we 
need  to  test  if  a  function  is  computable  by  OBDDs  of  width  w?  And  are  there  properties 
that  contain  only  functions  computable  by  OBDDs  of  width  w  that  are  much  harder  to 
test? 

Ron  and  Tsur  [88]  showed  that  testing  whether  a  function  is  computable  by  width-2 
OBDDs  can  be  done  with  O(logn)  queries.  Goldreich  [57]  showed  that  there  are  prop¬ 
erties  that  contain  only  functions  computable  by  width-2  OBDDs  that  are  much  harder  to 
test:  they  require  Q(n)  queries  to  test.  Theorem  7.1  immediately  yields  a  slight  sharpening 
of  this  result. 

Corollary  7.13.  There  is  a  property  V  containing  only  functions  computable  by  width-2 
OBDDs  that  requires  |  —  o(n )  queries  to  test. 

Proof.  Note  that  all  linear  functions  are  computable  by  width-2  OBDDs.  To  see  this, 
consider  f(x)  =  ©  •  •  •  ©  xlk.  We  can  build  a  width-2  OBDD  with  k  +  1  levels  where 

the  levels  are  labeled  with  ii, . . . ,  ik.  For  the  levels  2, . . . ,  k  +  1,  we  associate  one  node 
with  the  value  1  and  the  other  with  the  value  0.  Let  the  edges  labeled  0  (resp.,  1)  go  to  the 
node  of  the  next  level  with  the  same  (resp.,  different)  value  as  the  current  node.  Then  the 
value  of  the  node  on  level  i  represents  the  parity  of  the  variables  i(-\. 

As  a  result,  all  |-linear  functions  are  also  computable  by  width-2  OBDDs  and  the 
corollary  follows  directly  from  Theorem  7.1.  □ 


7.4  Notes  and  Discussion 

Alternative  proof.  The  proof  of  the  lower  bound  for  testing  A; -linearity  presented  in  this 
section  was  obtained  in  joint  work  with  Daniel  Kane  [24].  In  a  simultaneous  but  separate 
project  with  Joshua  Brody  and  Kevin  Matulef  [22,  23],  we  obtained  a  different  proof  of 
the  same  (asymptotic)  lower  bound.  That  proof  is  presented  in  Chapter  11. 
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Non-adaptive  testing  of  A  -linearity.  The  manuscript  [24]  includes  the  result  presented 
in  this  chapter  as  well  as  one  more  lower  bound.  The  second  lower  bound  shows  that 
testing  A-l  i  nearity  non- adaptively  requires  2  ■  min  {A;,  n  —  k}  —  0(1)  queries.  (This  lower 
bound  is  twice  as  large  as  the  lower  bound  for  adaptive  algorithms.) 

Asymptotically,  the  two  lower  bounds  are  equivalent.  The  non-adaptive  lower  bound, 
however,  says  something  much  stronger  about  the  problem  of  testing  ([-linearity:  it  says 
that  no  non-adaptive  tester  can  improve  on  the  query  complexity  of  the  folklore  tester  by 
more  than  an  additive  constant. 

One  may  wonder  if  the  same  strong  statement  also  applies  to  adaptive  algorithms:  is  it 
possible  to  show  that  all  adaptive  testers  for  ([-linearity  also  require  n  —  0(1)  queries?  The 
answer  is  no.  In  [24],  we  show  that  there  is  an  adaptive  algorithm  for  testing  ([-linearity 
with  | n  +  o(n )  queries.  This  shows  that  it  is  indeed  possible  to  beat  the  folklore  tester 
by  more  than  an  additive  constant  if  we  can  choose  the  queries  adaptively.  It  also  gives  a 
gap — albeit  a  small  one — between  the  query  complexities  of  testing  | -linearity  adaptively 
and  non- adaptively. 
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Part  II 

Testing  Function  Isomorphism 
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Chapter  8 


Testing  Isomorphism  to  Partially 
Symmetric  Functions 


The  first  part  of  this  thesis  was  mainly  concerned  with  determining  the  exact  query  com¬ 
plexity  for  testing  various  properties  of  boolean  functions.  The  current  part  aims  to  under¬ 
stand  why  different  properties  of  boolean  functions  can  be  tested  with  very  few  queries, 
while  other  properties  cannot. 

In  the  last  few  years,  there  has  been  great  progress  in  understanding  the  testability 
of  graph  properties — properties  of  graphs  G  =  (V,  E)  that  are  invariant  under  relabeling 
of  the  vertices — and,  to  a  large  extent,  the  problem  of  characterizing  the  set  of  graph 
properties  that  are  testable  with  a  constant  number  of  queries  has  been  largely  solved  [5, 
6,  7].  Recently,  there  has  also  been  much  progress  in  understanding  the  testability  of 
hypergraph  properties  [11]  and  the  testability  of  algebraic  properties — that  is,  properties 
that  are  invariant  under  linear  or  affine  transformations — of  functions  [18,  72,  95]. 

Our  understanding  of  the  testability  of  (non-algebraic)  properties  of  boolean  functions, 
however,  remains  largely  incomplete.  One  approach  for  remedying  this  situation,  first 
proposed  by  Fischer  et  al.  [52],  is  to  study  the  function  isomorphism  testing  problem.  Two 
boolean  functions  /,  g  :  {0, 1}"  — >  {0, 1}  are  said  to  be  isomorphic  if  they  are  identical  up 
to  permutation  of  the  input  labels.  The  / -isomorphism  e-testing  problem  asks  a  tester  to 
determine  whether  a  function  g  is  isomorphic  to  /  or  whether  it  is  e-far  from  being  so  with 
as  few  queries  as  possible.  When  it  is  possible  to  e-test  /-isomorphism  with  a  number  of 
queries  that  depends  on  e  but  not  on  n,  we  say  that  /  is  efficiently  isomorphism-testable. 
The  function  isomorphism  testing  ( characterization )  problem  asks  us  to  determine  the  set 
of  functions  that  are  efficiently  isomorphism-testable. 
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The  current  chapter’s  focus  is  on  functions  that  are  efficiently  isomorphism-testable. 
That  is,  we  seek  upper  bounds  on  the  query  complexity  for  testing  /-isomorphism  for 
different  functions  /.  Lower  bounds  on  the  query  complexity  for  testing  function  isomor¬ 
phism  will  be  established  in  the  next  two  chapters. 


Previous  work.  Let’s  begin  with  some  folklore  observations.  When  /  :  {0,  l}n  — > 
{0, 1}  is  a  (fully)  symmetric  function,  then  testing  isomorphism  to  /  is  equivalent  to  testing 
identity  to  /.  (Since  the  only  function  isomorphic  to  /  is  /  itself.)  So  in  this  case  e-testing 
/-isomorphism  can  be  done  with  0(l/e)  queries. 

Another  folklore  result  is  obtained  with  a  similar  argument.  For  any  function  /  : 
(0,  l}n  — >  (0, 1},  there  are  at  most  n\  functions  g  that  are  isomorphic  to  /.  As  a  result,  by 
Corollary  3.14,  we  can  e-test  /-isomorphism  with  0(log(n!)/e)  =  0(n  log(n)/e)  queries. 

The  two  above  observations  can  be  unified  and  generalized  in  the  following  way.  For 
the  function  /  :  (0,  l}n  — >■  {0,1},  let  n(f)  =  \ {fn  :  tt  G  <Sn}|  denote  the  number 
of  distinct  functions  that  are  isomorphic  to  /.  Then  by  Corollary  3.14  we  can  test  /- 
isomorphism  with  O (log(/t f) / e)  queries. 

The  generalized  folklore  observation  implies  that  when  /  is  a  k -junta,  it  is  possible  to 
e-test  /-isomorphism  with  0(log((”)/c!)/e)  =  0(klog(n) /e)  queries.  Fischer  et  al.  [52] 
improved  on  this  result  by  showing  that  when  /  is  a  A; -junta,  it  is  possible  to  e-test  /- 
isomorphism  with  only  poly(A;,  e)  queries.  In  other  words,  when  /  is  a  junta  (with  a 
constant  number  of  relevant  variables),  then  /  is  efficiently  isomorphism-testable. 

The  query  complexity  for  testing  juntas  was  sharpened  by  Chakraborty,  Garcia  Sori¬ 
ano,  and  Matsliah  [40].  By  building  on  the  junta  test  presented  in  Chapter  5,  they  showed 
that  it  is  possible  to  test  k -juntas  with  0(k  log  k)  queries. 

At  a  qualitative  level,  the  folklore  observations  and  the  results  in  [52,  40]  show  that 
symmetric  functions  and  junta  functions  are  efficiently  isomorphism-testable.  When  we 
began  the  research  project  presented  in  this  chapter,  those  were  the  essentially  the  only 
functions  that  were  known  to  be  efficiently  isomorphism-testable. 

Our  result.  It  is  not  too  hard  to  see  that  juntas  and  symmetric  functions  cannot  be  the 
only  functions  that  are  efficiently  isomorphism-testable.  For  example,  consider  the  func¬ 
tion  defined  by  f(x)  =  X\  ©  •  •  •  ©  xn_i.  This  function  is  not  symmetric  and  has  n  —  1 
relevant  variables.  This  function,  however,  can  be  represented  as  the  combination  of  a 
symmetric  function  and  a  junta  function: 

f(x)  =  (x1  ©  •  •  •  ©  xn)  ©  xn. 
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We  can  use  this  observation  to  build  an  /-isomorphism  tester  that  requires  only  0(  1/e) 
queries:  given  some  function  g  :  {0,  l}n  — >  {0, 1},  we  test  whether  the  function  g'  ob¬ 
tained  by  setting 

g'{x)  =  (xi  ®  •  •  •  ©  xn)  ®  g(x) 

is  isomorphic  to  the  function  fix)  =  xn.  We  leave  it  to  the  reader  to  verify  that  when  g  is 
isomorphic  to  /,  then  g'  is  isomorphic  to  f  and  that  when  g  is  e-far  from  isomorphic  to  / 
then  g'  is  also  e-far  from  isomorphic  to  /'. 

When  examining  why  the  function  /  defined  in  the  last  paragraph  is  efficiently  isomorphism- 
testable,  one  may  notice  that  it  is  a  partially  symmetric  function.  This  leads  to  the  natural 
question:  are  all  partially  symmetric  functions  efficiently  isomorphism-testable?  The  main 
result  of  this  chapter  gives  an  affirmative  answer  to  this  question. 

Theorem  8.1.  For  every  (n  —  k)- symmetric  function  f  :  {0,  l}n  — >  {0, 1},  there  exists  an 
e-tester  for  f  -isomorphism  that  requires  only  0(k  log  {k)/e2)  queries. 

This  result  presents  a  unified  explanation  for  the  efficient  isomorphism-testability  of 
juntas  and  of  symmetric  functions.  Since  many  partially  symmetric  functions — such  as  the 
one  in  the  example  above — are  not  juntas  or  fully  symmetric,  it  also  significantly  extends 
the  set  of  functions  that  we  know  to  be  efficiently  isomorphism-testable. 

One  might  wonder  if,  in  turn,  the  set  of  partially  symmetric  functions  is  only  a  subset 
of  all  the  functions  that  are  efficiently  isomorphism-testable.  In  [26],  we  conjecture  that 
the  answer  to  this  question  is  essentially  “no”:  that  partial  symmetry  is  effectively  the 
characteristic  that  determines  whether  functions  are  efficiently  isomorphism-testable  or 
not.  This  conjecture  is  still  open;  see  [26]  for  the  details. 

The  proof  of  Theorem  8.1  is  constructive.  In  the  next  section,  we  introduce  an  algo¬ 
rithm  for  efficiently  testing  isomorphism  to  partially  symmetric  functions.  The  analysis  of 
this  algorithm,  and  thus  the  proof  of  the  main  theorem,  follows  in  Section  8.2. 


8.1  The  Algorithm 

The  algorithm  we  introduce  for  testing  isomorphism  to  partially  symmetric  functions  fol¬ 
lows  the  general  outline  of  the  isomorphism  tester  for  juntas  introduced  by  Fischer  et 
al.  [52]  and  refined  by  Chakraborty,  Garcia  Soriano,  and  Matsliah  [40].  The  algorithm 
proceeds  in  two  stages. 

The  first  stage  tests  whether  the  given  function  is  partially  symmetric.  We  use  the 
PartiallySymmetricTest  algorithm  from  Chapter  6  for  this  task,  with  one  minor 
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modification:  when  that  test  accepts,  it  also  returns  the  partition  of  [n]  that  it  defined  and 
the  asymmetric  parts  that  it  identified. 

The  second  stage  of  the  algorithm  verifies  that,  if  the  function  is  indeed  partially  sym¬ 
metric,  it  is  consistent  with  the  target  function  (up  to  relabeling  of  the  input  variables). 
This  second  stage  relies  on  an  efficient  “core  sampler”  to  reduce  the  number  of  queries. 

8.1.1  Core  sampler  for  partially  symmetric  functions 

Let  /  :  (0,  l}n  — y  {0, 1}  be  J-symmetric  for  some  set  J  C  [n]  of  size  \  J\  =  n  —  k. 
This  (n  —  k) -symmetric  function  can  be  represented  in  a  concise  manner  as  the  function 
/core  :  {0,  l}fc  x  {0, 1, . . . ,  n  —  k}.  We  call  the  function  /core  the  core  of  /.  Two  in  —  k )- 
symmetric  functions  f,g  :  (0,  l}n  — »  (0, 1}  are  isomorphic  iff  their  core  functions  are 
isomorphic.  The  latter  task  can  be  done  with  an  efficient  sample  extractor. 

In  the  following  definition,  let  T>*nk  be  the  distribution  on  pairs  (x.  w)  £  (0, 1 } k  x 
{0, 1, . . . ,  n  —  k}  where  x  is  drawn  from  the  uniform  distribution  over  (0,  l}fe  and  w 
is  drawn  independently  from  the  binomial  distribution  Bin(n  —  k,  \ ).  A  perfect  sample 
extractor  for  a  partially  symmetric  function  draws  an  input  from  V*  k  and  returns  the  value 
of  the  core  of  that  function  on  this  input. 

Definition  8.2.  A  perfect  sampler  for  the  (n-/c)-symmetric  function  /  :  (0,  l}n  — >  (0, 1} 
is  a  randomized  algorithm  that  queries  /  on  a  single  input  and  returns  a  triplet  (x,  w,  z )  £ 
{0,  l}fc  x  (0,1 , . . .  ,n  —  k}  x  (0, 1}  where 

1.  (x,w)  ~  V*n  k\  and 

2*  fcore(%’>  'Ml) 

If  we  know  the  identity  of  the  set  J  of  k  asymmetric  variables  in  the  (n  —  k) -symmetric 
function  /,  it  is  easy  to  design  a  perfect  sampler  for  this  function:  draw  y  £  (0,  l}n 
uniformly  at  random  and  return  the  triplet  (yj,  \ \  yj \ \ .  fiy))-  If  we  do  not  know  the  exact 
set  J  of  asymmetric  variables,  however,  it  is  much  easier  to  design  an  approximate  sample 
extractor  for  the  function 

Definition  8.3.  A  5-sampler  for  the  (n  —  A')-symmctric  function  /  :  (0,  l}n  -X  (0, 1}  is 
a  randomized  algorithm  that  queries  /  on  a  single  input  and  returns  a  triplet  ( x,w,z )  £ 
{0,  l}k  x  {0,1,...,  ?7.  —  ky  x  (0, 1}  where 

1.  The  distribution  V  of  (x,  w)  satisfies  dTv(£\  ^n,k)  —  an(^ 
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2.  z  —  fcoreix.  w )  with  probability  at  least  1  —  5. 

Given  a  partition  X  =  {/i, . . . .  /,}  of  [n],  we  say  that  :r  £  {0,  l}n  respects  the  partition 
X  if  for  every  1  <  i  <  s  —  1,  each  coordinate  in  I,  has  the  same  value  in  x  (i.e.,  if  for 
every  part  X  we  have  either  xj  =  e1  or  xj  =  0).  We  define  T>x  to  be  the  distribution  over 
{0, 1}"  obtained  by  the  following  procedure.  First,  we  sample  w  ~  Bin(n,  |).  If  there 
exist  some  elements  x  £  {0,  l}n  of  Hamming  weight  ||a;||  =  w  that  respects  the  partition 
X,  we  choose  one  of  those  elements  uniformly  at  random  and  return  it.  If  no  such  element 
exists,  we  return  0. 

The  following  algorithm  is  an  efficient  (approximate)  sampler  for  the  core  of  a  partially 
symmetric  function  when  it  is  given  a  partition  that  splits  the  asymmetric  variables  among 
the  sets  X,  ■  ■  • ,  X- i- 


SamplePSF (/,  Ji, . . . ,  Is,  J) 

1.  Draw  y  rs_/  Vx. 

2.  Let  x  £  {0,  l}fc  be  the  value  assigned  to  the  parts  in  J. 

3.  Return  the  triplet  {*,  INI  -  INI  J(y))- 


8.1.2  The  isomorphism-testing  algorithm 

We  are  now  ready  to  describe  the  algorithm  for  isomorphism  testing  of  (n  —  k) -symmetric 
functions.  Given  an  (n  —  A')-symmctric  function  /,  the  following  algorithm  tests  whether 
the  input  function  g  is  isomorphic  to  /  or  e-far  from  being  so. 


PSFISOTEST(/,A;,3,e) 

1.  If  PartialSymmetryTest(si,  k,  does  not  accept,  reject. 

2.  Let  {/i, . . . .  X  }  and  J  C  [s  —  1]  be  the  partition  and  asymmetric 
parts  defined  by  PartialSymmetryTest. 

3.  For  i  =  1, . . . ,  Q(klog(k)/e2), 

3.1.  Draw  z^)  •(—  SamplePSF(/,  X, . . . ,  Is,  J). 

4.  Accept  iff  at  least  a  1  —  |  fraction  of  the  triplets  z^) 

are  consistent  with  the  core  function  of  some  isomorphism  of  /. 
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8.2  Analysis  of  the  Algorithm 


We  begin  by  establishing  some  technical  properties  about  the  distributions  defined  in  the 
last  section. 

Proposition  8.4.  Let  J  =  {j  i, . . .  ,jk}  C  [n]  be  a  set  of  size  k,  and  r  =  0(A'2)  be  odd.  If 
x  ~  'Dj  for  a  random  partition  X  of  [n]  into  r  parts  and  a  random  workspace  W  G  X, 
then 


•  x  is  o{l/n)-close  to  being  uniform  over  {0,  l}n,  and 

•  (xj.  ||x7||)  is  c/k-close  to  being  distributed  according  to  T>k  n,  for  our  choice  of 
0  <  c  <  1. 

Proof.  We  start  with  the  first  part  of  the  proposition,  showing  x  is  almost  uniform.  Con¬ 
sider  the  following  procedure  to  generate  a  random  X.  W  and  x.  We  draw  a  random  Ham¬ 
ming  weight  w  ~  Bn,  1/2  and  define  x’  to  be  the  input  consisting  of  w  ones  followed  by 
n  —  w  zeros.  We  choose  a  random  partition  X'  of  [n]  into  r  consecutive  parts  Ii, ...  ,Ir  (i.e., 
1 1  =  {1, 2, . . . ,  | . . ,  Ir  =  {n  —  \Ir  \  +  1, . . . ,  n})  according  to  the  typical  distribution 
of  sizes  in  a  random  partition.  Let  the  workspace  W’  be  the  only  part  which  contains  the 
coordinate  w  (or  I  \  if  w  =  0).  We  now  apply  a  random  permutation  over  x',  X'  and  W'  to 
get  x,  X  and  W. 

The  above  procedure  outputs  a  random  element  x  that  is  uniformly  distributed  over 
(0,  l}n.  The  choice  of  X  was  also  done  at  random,  considering  the  applied  permutation 
over  X' .  The  only  difference  is  then  in  the  choice  of  the  workspace  W,  which  can  only 
be  reflected  in  its  size.  However,  when  r  =  o(y/n)  we  will  choose  the  middle  part  as  the 
workspace  with  probability  1  —  o(l),  regardless  of  its  size.  In  the  remaining  cases,  since 
there  are  n/r  =  Cl(y/n)  parts,  the  possible  parts  to  be  chosen  as  workspace  are  a  small 
fraction  among  all  parts,  and  therefore  W  would  be  o(l) -close  to  being  a  random  part. 

Proving  the  second  property  of  the  proposition,  we  also  consider  two  cases.  When 
r  =  o(y/n),  with  probability  1  —  o(l),  the  workspace  would  have  size  uj(\ff)  and  also 
w  =  n/2  +  0(^/n).  In  such  a  case,  the  r  —  1  parts  (excluding  the  workspace)  would  be 
half  zeros  and  half  ones,  and  the  marginal  distribution  over  the  number  of  ones  in  J  would 
be  "Hr_i  (r_i)/2,fc  (assuming  the  elements  of  J  are  separated  by  X,  which  happens  with 
probability  1  —  o(l)).  By  Lemma  4.3,  the  distance  between  this  distribution  and  £>^1/2  is 
bounded  by  k/r  <  c/k  for  our  choice  of  0  <  c  <  1.  Since  there  is  no  restriction  on  the 
ordering  of  the  sets,  this  is  also  the  distance  from  uniform  over  (0, 1 } k  as  required. 

In  the  remaining  case  where  r  =  Li(yfn),  we  can  use  the  same  arguments  and  also 
apply  Lemma  4.4  with  the  distributions  Bk,\/2  and  Bk,  1/2+5  for  <5  =  Ofl/x/n),  implying 
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the  distance  between  these  two  distributions  is  at  most  o(l).  Combining  this  with  the 
distance  to  'Hr-i,(r-i)(i/2+s),k  we  get  again  a  total  distance  of  k/r  +  o(l)  <  c/k  for  our 
choice  of  0  <  c  <  1.  □ 


We  now  establish  a  basic  fact  regarding  the  functions  that  are  accepted  by  the  Par- 
tiallySymmetricTest. 

Lemma  8.5.  Let  g  be  a  function  e-close  to  being  (n  —  k)- symmetric  which  passed  the 
PartiallySymmetricTest^,  k,  e).  In  addition,  letX.  W  and  J  be  the  partition,  workspace 
and  identified  parts  used  by  the  algorithm.  With  probability  at  least  9/10,  there  exists  a 
function  h  which  satisfies  the  following  properties. 

•  h  is  Ae-close  to  g,  and 

•  h  is  (n-k)- symmetric  whose  asymmetric  variables  are  contained  in  J  and  separated 
byl. 

Proof.  Let  g*  be  the  (n  —  A;) -symmetric  function  closest  to  g  (which  can  be  /  itself,  up- 
to  some  isomorphism)  and  R  be  the  set  of  (at  most)  k  asymmetric  variables  of  g*.  By 
Lemma  6.3  and  our  assumption  over  g, 

Symlnfg(i?)  <  2  ■  dist (g,g*)  <  2e  . 

Notice  however  that  R  is  not  necessarily  contained  in  J  and  therefore  g*  is  not  a  good 
enough  candidate  for  h.  Let  U  =  R  D  J  be  the  intersection  of  the  asymmetric  variables  of 
g*  and  the  sets  identified  by  the  algorithm.  In  order  to  show  that  g  is  also  close  to  being 
U -symmetric,  we  bound  SymInfg(C7)  using  Lemma  6.6  with  the  sets  R  and  J.  Notice  that 
since  \R\  <  k  and  \  J\  <  2 kn/r  <  e2n/d  for  our  choice  of  d ,  we  can  bound  the  error  term 
(in  the  notation  of  Lemma  6.6)  by  Cyfy  <  c  \J e:2/ d  <  e.  We  therefore  have 

Symlnfg((7)  <  Symlnf9(i?)  +  SymInfff(J)  +  e<2e  +  e  +  e  =  4e 

where  we  know  Symlnf,y(./)  <  e  with  probability  at  least  19/20  as  the  algorithm  did  not 
reject. 

By  applying  Lemma  6.3  again,  we  know  there  exists  a  U -symmetric  function  h,  whose 
distance  to  g  is  bounded  by  dist (g,  h)  <  4e.  Moreover,  with  probability  at  least  19/20, 
all  its  asymmetric  variables  are  completely  separated  by  the  partition  X  (and  they  were  all 
identified  as  part  of  J).  □ 

We  now  generalize  a  recent  result  of  Chakraborty  et  al.  [39] ’s  result  concerning  effi¬ 
cient  algorithms  for  sampling  the  core  of  juntas. 
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Theorem  8.6.  Let  f  :  {0,  l}n  — »  {0, 1}  be  (n  —  k)-symmetric  with  k  <  n/ 10.  There  is  an 
algorithm  that  queries  f  on  0 (  ^  log  inputs  and  with  probability  at  least  1  —  q  outputs 
a  5-sampler  for  f. 


Proof  of  Theorem  8.6.  The  algorithm  for  generating  the  sampler  is  described  by  Par- 
tiallySymmetricS  AMPLER,  which  performs  0(^  log  preprocessing  queries  to  the 
function.  What  remains  to  be  proved  is  that  indeed  with  good  probability,  the  algorithm 
returns  a  valid  sampler. 

Let  h  be  the  function  defined  in  the  analysis  of  Theorem  8.1,  which  satisfies  the  condi¬ 
tions  of  Lemma  8.5.  Recall  that  its  asymmetric  variables  were  separated  by  X  and  appear 
in  J.  Following  this  analysis  and  that  of  PartiallySymmetricTest,  one  can  see  that 
with  probability  at  least  1  —  q  we  would  not  reject  /  when  calling  PartiallySymmet¬ 
ricTest.  Moreover,  the  samples  would  be  5/2-close  to  sampling  the  core  of  h,  which  is 
by  itself  5/2-close  to  /.  Therefore,  overall  our  samples  would  be  5-close  to  sampling  the 
core  of  /. 

The  last  part  in  completing  the  proof  of  the  theorem  is  showing  that  we  sample  the  core 
with  distribution  5-close  to  T>ln.  By  Proposition  8.4,  the  total  variation  distance  between 
sampling  the  core  according  to  V*kn  and  sampling  it  according  to  Vf  is  at  most  c/k  for 
our  choice  of  0  <  c  <  1,  which  we  can  choose  it  to  be  at  most  5.  □ 

Notice  that  if  the  function  /  is  not  (n  —  /c)-symmetric  but  still  very  close  (say  (k/p5)‘2- 
close),  applying  the  same  algorithm  will  provide  a  good  sampler  for  an  (n  —  k) -symmetric 
function  f  close  to  /.  The  main  reason  is  that  most  likely,  we  will  not  query  any  location 
of  the  function  where  it  does  not  agree  with  /'. 

Finally,  we  are  ready  to  complete  the  analysis  of  PSFIsoTest. 


Proof  of  Theorem  8.1.  Before  analyzing  the  algorithm  we  just  described,  we  consider  the 
case  where  k  >  n/ 10.  Since  Theorem  6.1  does  not  hold  for  such  k’s,  we  apply  the  basic 
algorithm  of  0(n  log  n/e)  random  queries,  which  is  applicable  testing  isomorphism  of 
any  given  function  (since  there  are  n!  possible  isomorphisms,  the  random  queries  will  rule 
out  all  of  them  with  good  probability,  assuming  we  should  reject).  Since  k  =  Q(n),  the 
complexity  of  this  algorithm  fits  the  statement  of  our  theorem. 

We  start  by  analyzing  the  query  complexity  of  the  algorithm.  The  step  of  PartiallyS  YM- 
metricTest  performs  (){  -  log  -)  queries,  and  therefore  the  majority  of  the  queries  are 
performed  at  the  sampling  stage,  resulting  in  Oik  log  k/f)  queries  as  required.  In  order 
to  prove  the  correctness  of  the  algorithm,  we  consider  the  following  cases. 
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•  g  is  e-far  from  being  isomorphic  to  /  and  e/1000-far  from  being  (n  —  k) -symmetric. 

•  g  is  e-far  from  being  isomorphic  to  /  but  e/1000-close  to  being  (n  —  k) -symmetric. 

•  g  is  isomorphic  to  /. 

In  the  first  case,  with  probability  at  least  9/10,  PartiallySymmetricTest  will  reject 
and  so  will  we,  as  required.  We  assume  from  this  point  on  that  PartiallySymmet¬ 
ricTest  did  not  reject,  as  it  will  only  reject  g  which  is  isomorphic  to  /  with  probability 
at  most  1/10,  and  that  we  are  not  in  the  first  case.  Notice  that  these  cases  match  the  con¬ 
ditions  of  Lemma  8.5,  and  therefore  from  this  point  onward  we  assume  there  exists  an  h 
satisfying  the  lemma’s  properties  (remembering  we  applied  the  algorithm  with  e/1000). 

In  order  to  bound  the  distance  between  h  and  g  in  our  samples,  we  use  Proposition  8.4, 
indicating 

Pr  J g{x)  ±  h(x)\  =  dist(g,  h)  +  o(l/n)  . 

By  Markov’s  inequality,  with  probability  at  least  9/10,  the  partition  X  and  the  workspace 
W  satisfy 

Pr  \g(x)  7^  hix) ]  <  10  ■  dist  (g,  h)  +  oil  In)  <  10  ■  4e/1000  +  oil  In)  <  e/20  . 


By  Proposition  8.4,  if  we  were  to  sample  h  according  to  T>Y,  it  should  be  e/20-close 
to  sampling  its  core  (assuming  the  partition  size  is  large  enough).  Combined  with  the 
distance  between  g  and  h  in  our  samples,  we  expect  our  samples  to  be  e/20  +  e/20  =  e/10 
close  to  sampling  h’s  core. 

The  last  part  of  the  proof  is  showing  that  there  would  be  an  almost  consistent  iso¬ 
morphism  of  /  only  when  g  is  isomorphic  to  /.  Notice  however  that  we  care  only  for 
isomorphisms  which  map  the  asymmetric  variables  of  /  to  the  k  sets  of  J.  Therefore,  the 
number  of  different  isomorphisms  we  need  to  consider  is  k\. 

Assume  we  are  in  the  second  case  and  g  is  e-far  from  being  isomorphic  to  /.  Let  fn  be 
some  isomorphism  of  /.  By  our  assumptions  and  Lemma  8.5, 

dist(y7r,  h)  >  dist(/vr,  g)  —  dist(<?,  h)  >  e  —  e/250  . 

Each  sample  we  perform  would  be  inconsistent  with  fn  with  probability  at  least  e— e/250— 
e/10  >  8e/9.  By  the  Chernoff  bounds  and  the  union  bound,  if  we  would  perform  q  = 
0(k  log  /c/e2)  queries,  we  would  rule  out  all  k\  possible  isomorphisms  with  probability  at 
least  9/10  and  reject  the  function  as  required. 
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On  the  other  hand,  if  g  is  isomorphic  to  /,  then  we  know  there  exists  with  probability 
at  least  9/10  some  isomorphism  fn  which  maps  the  asymmetric  variables  of  /  into  the  sets 
of  J,  such  that 


dist (fn,  h )  <  dist(/7r,  g )  +  dist(r/,  h)  <  e/500  +  e/250  . 

For  this  isomorphism,  with  high  probability  much  more  than  (1  —  e/2)-fraction  of  the 
queries  would  be  consistent  and  we  would  therefore  accept  g  as  we  should.  □ 
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Chapter  9 

Nearly  Universal  Lower  Bound  for 
Testing  Isomorphism 


In  the  last  chapter,  we  saw  that  for  every  function  /  :  {0,  l}n  — *  {0, 1},  it  is  possible  to 
e-test  /-isomorphism  with  0(n  log (n)/e)  queries.  We  also  saw  that  for  some  functions — 
specifically,  for  partially  symmetric  functions — that  universal  upper  bound  is  far  from 
tight.  The  goal  of  this  chapter  is  to  see  if  this  universal  upper  bound  is  tight — or  nearly 
tight — for  any  functions,  or  whether  /-isomorphism  can  be  tested  much  more  efficiently 
for  every  function  /. 

In  the  most  extreme  case,  we  might  ask  whether  /-isomorphism  can  be  tested  with  a 
constant  number  of  queries  for  every  function  /.  Fischer  et  al.  [52]  showed  that  this  is 
not  the  case.  Specifically,  they  showed  that  for  every  k  =  o(y/n),  testing  isomorphism 
to  the  k- linear  function  f(x)  =  X\  ©  •  •  •  ©  Xk  requires  a  number  of  queries  that  depends 
on  k.  So  when  cc(l)  <  k  <  o(\/k),  testing  isomorphism  to  k- linear  functions  requires  a 
super-constant  number  of  queries. 

Until  recently,  that  was  the  only  lower  bound  known  for  the  problem  of  testing  function 
isomorphism.  In  this  chapter,  we  provide  a  significantly  stronger  lower  bound.  We  show 
that  the  universal  upper  bound  of  0(n  log  n)  queries  is  tight,  up  to  logarithmic  factors,  for 
almost  every  boolean  function.  More  precisely,  we  establish  the  following  lower  bound. 

Theorem  9.1.  Fix  0  <  e  <  ^.  For  a  1  — o(l)  fraction  of  the  functions  f  :  {0, 1}"  — >  {0, 1}, 
any  non-adaptive  algorithm  for  e-testing  isomorphism  to  f  must  make  at  least  //  queries. 

The  proof  of  the  theorem  that  we  present  in  this  chapter  is  non-constructive:  we  show 
that  if  we  pick  a  boolean  function  uniformly  at  random,  then  with  probability  1  —  o(l), 
testing  isomorphism  to  that  function  requires  at  least  //  queries. 
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9.1  The  Lower  Bound 


The  proof  of  Theorem  9.1  uses  Yao’s  Minimax  Principle  [100].  For  a  fixed  function  / 
we  introduce  two  distributions  .Fyes  and  Fno  such  that  a  function  g  ~  lFye s  is  isomorphic 
to  /  and  a  function  g  ~  J7,,,,  is  e-far  from  isomorphic  to  /  with  high  probability.  We 
then  show  that  for  most  choices  of  /,  deterministic  non-adaptive  testing  algorithms  cannot 
distinguish  functions  drawn  from  either  of  these  distributions  with  only  ^  queries. 

We  define  Fyes  to  be  the  uniform  distribution  over  functions  isomorphic  to  /.  In  other 
words,  we  draw  a  function  g  ~  by  choosing  ix  G  Sn  uniformly  at  random  and  setting 

9  =  U 

A  first  idea  for  Jno  may  be  to  make  it  the  uniform  distribution  over  all  boolean  func¬ 
tions  {0,  l}n  — >  {0, 1}.  This  idea  does  not  quite  work,  since,  for  example,  a  random 
function  differs  from  /  and  all  functions  isomorphic  to  it  on  the  all  0  input  or  the  all  1 
input  with  probability  at  least  3/4.  However,  a  simple  modification  of  this  idea  does  work: 
to  draw  a  function  g  ~  Fuo,  we  choose  a  permutation  7r  G  Sn  uniformly  at  random  and 
we  choose  a  function  grand  uniformly  at  random  from  all  boolean  functions  on  n  variables. 
We  then  let  g  be  the  function  defined  by 


g(x) 


fendO)  if  f  <  |M|  <  , 

fn(x)  otherwise. 


With  high  probability,  a  function  g  ~  FU(>  is  far  from  isomorphic  to  /. 

Proposition  9.2.  Fix  0  <  e  <  f.  For  any  function  f  :  {0,  l}n  {0,1},  the  function 
9  ~  -F no  is  e-close  to  isomorphic  to  f  with  probability  at  most  o(l). 


Proof.  Fix  any  permutation  n  G  Sn.  Let  gran d  be  the  random  function  generated  in  the 
draw  of  g  ~  Fno.  By  the  triangle  inequality, 

dist(g,  ff)  >  dist (c/rand,  /, r)  -  dist(r/,  r/rand). 

Since  dist(^, ^rand)  <  2  (")/2n  <  0(1)>  t0  complete  the  proof  it  suffices  to  fix 

e  <  e'  <  \  and  show  that  dist(.r/rand,  fn)>f  with  high  probability. 

Let  r)  =  1  —  2e'.  For  any  x  G  {0,l}n,  gra,n&(x)  =  fn(x)  with  probability  f,  so 
E[dist(grand,  fV)]  =  {.  By  Chemoff’s  bound  (see,  e.g.,  Appendix  A  in  [8]), 

Pr[dist(grand,  fn)  <  e']  =  Pr[dist(^rand,  fn)  <  (1  -  77) |]  <  e_2’I,?2/6  <  o(^). 

Taking  the  union  bound  over  all  choices  of  it  G  Sn  completes  the  proof.  □ 
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Let  T  be  any  deterministic  non-adaptive  algorithm  that  attempts  to  test  /-isomorphism 
with  at  most  ^  queries  to  an  unknown  function  g.  We  will  show  that  T  cannot  reliably 
distinguish  between  the  cases  where  g  was  drawn  from  Fyes  or  from 

Let  Q  C  {0, 1}"  be  the  set  of  queries  performed  by  T  on  g.  We  partition  the  queries 
in  Q  in  two:  the  set  Qb  =  {q  e  Q  :  |  <  \q\  <  of  balanced  queries,  and  the  set 
Qu  —  Q\Qb  °f  unbalanced  queries. 

When  (j  is  drawn  from  Tyes  or  from  Tno,  the  responses  to  the  unbalanced  queries  (),, 
are  consistent  with  some  function  fn  isomorphic  to  /.  Our  next  proposition  shows  that 
when  T  makes  only  queries  to  g,  then  in  fact  the  responses  to  the  unbalanced  queries 
will  be  consistent  with  many  functions  isomorphic  to  /.  More  precisely,  define 

n f(g,Qu)  =  {tt  g  Sn  :  UiQu)  =  g(Qu)} 

to  be  the  set  of  permutations  n  for  which  fn  is  consistent  with  the  responses  to  the  queries 
Qu.  The  following  proposition  shows  that  when  the  unknown  function  is  drawn  from  lFyes 
or  from  Tutl,  then  with  high  probability  the  set  Uf(g,  Qu)  is  large. 

Proposition  9.3.  Let  Qu  be  any  set  of  unbcdanced  queries  and  let  g  be  a  function  drawn 
from  J-’yes  or  from  Fno.  Then  for  any  0  <  t  <  1, 

PrOn/s.QJI^.^L]  <t. 

Proof  When  g  rs_/  yes  9  rs_/  Tn o,  then  g(x)  =  fn(x)  for  every  unbalanced  input  x,  where 
7T  is  chosen  uniformly  at  random  from  Sn.  So  it  suffices  to  show  that  Pr7r[|IIj(/7r,  Qu) \  < 

For  every  r  e  {0,  let  Sr  C  Sn  be  the  set  of  permutations  a  for  which  fa{Qu)  = 

r.  A  set  Sr  is  small  if  |SV|  <  The  union  of  all  small  sets  covers  at  most  2 = 

tn\  permutations,  so  the  probability  that  a  randomly  chosen  permutation  7r  belongs  to  a 
small  set  is  at  most  t.  □ 

The  last  proposition  showed  that  when  g  is  drawn  from  lFyes  or  from  J7,,,,,  then  with 
high  probability  11/ (g,  ()„  )  is  large;  the  next  lemma  shows  that  conditioned  on  I l/(g,  (),,) 
being  large,  the  distribution  on  the  responses  to  the  balanced  queries  is  nearly  uniform, 
even  when  g  ~  .Fyes.  Specifically,  given  a  function  /  and  a  set  S  of  permutations,  we 
define  the  discrepancy  of  f  on  S  to  be 

(/)  =  ^  max  n  Pr  [, fn{Qb)  =  r]  -2~&> 

Qfa-IQbl — ^qq  TfC-S 
re{0,l}lQi>l 
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We  then  define  the  discrepancy  of  /  to  be 


A  (/) 


max 

Qu'-\Qu\=J^q 

7r:\Uf(fn,Qu)\>n\/2n/50 


An  fUn,Qu)(f)- 


The  following  lemma  shows  that  A (/)  is  small  for  almost  all  functions  /. 

Lemma  9.4.  When  f  is  drawn  uniformly  at  random  from  the  set  of  functions  {0,  l}n  — > 
(0, 1}, 

Pr  [A (/)  >  *  -  2-4'j  <2-^2"/25). 

We  prove  Lemma  9.4  in  the  next  section,  but  first  we  show  how  it  implies  Theorem  9.1. 

Theorem  9.1  (Restated).  Fix  0  <  e  <  For  a  1  —  o(l)  fraction  of  the  functions  f  : 
(0,  l}n  — >  (0, 1},  any  non-adaptive  algorithm  for  e-testing  isomorphism  to  f  must  make 
at  least  ^  queries. 

Proof  By  Lemma  9.4,  with  probability  at  least  1  —  2-n(2”/“  ’)  =  1  —  o(l),  the  discrepancy 
of  a  randomly  drawn  function  /  :  (0,  l}n  — >  (0, 1}  is  A (/)  <  |2“ibo.  Fix  /  to  be  any 
function  that  satisfies  this  condition.  We  will  show  that  testing  isomorphism  to  /  requires 
at  least  ^  queries. 

As  discussed  earlier,  we  complete  the  proof  with  Yao’s  Minimax  Principle,  with  the 
distributions  Fyes  and  FUQ  as  defined  at  the  beginning  of  the  section.  Let  T  be  any  de¬ 
terministic  non-adaptive  algorithm  that  makes  at  most  queries  to  the  input  function  g, 
and  let  Q  =  Qu  U  Qb  represent  the  queries  made  by  T.  Without  loss  of  generality,  we  can 
assume  \QU\  =  \Qb\  =  (If  \Qb\  <  simply  add  extra  balanced  queries  to  Qb\  this 
can  only  help  T  determine  whether  g  was  drawn  from  Fy^  or  from  J7,,,,.  Similarly,  adding 
unbalanced  queries  to  Qu  can  only  help  T.) 

By  Proposition  9.3,  the  probability  that  |TT f(g,  Qu)\  <  is  at  most  2n/\oo  =  o(l). 
Assume,  thus,  that  this  event  does  not  happen.  Let  TZyes  and  lZno  be  the  distribution  of  the 
responses  to  the  balanced  queries  Qb.  Then  the  total  variation  distance  between  1Zyes  and 
lZno  is  bounded  by 


dTv(7^yes>  ^-no)  —  X  ^ 


re{0,l}TTO 


■^CT\-f{g,Qu) 


—  r]  —  2  ioo 


<  I -2100  A (/)  <  -. 


(9.1) 


96 


Therefore,  if  T  accepts  functions  drawn  from  Fyes  with  probability  at  least  |,  (9.1) 
implies  that  T  also  accepts  functions  drawn  from  jFno  with  probability  at  least  |  —  |  =  \ . 
But  by  Proposition  9.2,  a  function  drawn  from  Tao  is  e-far  from  isomorphic  to  /  with 
probability  1  —  o(l),  so  T  can’t  be  a  valid  e-tester  for  isomorphism  to  /.  □ 


9.2  Proof  of  Lemma  9.4 


The  first  step  in  the  proof  of  Lemma  9.4  is  to  show  that  for  any  sufficiently  small  set  Q 
of  balanced  queries  and  sufficiently  large  set  5  of  permutations,  the  set  {^(Q^es  can  be 
partitioned  into  a  number  of  large  pairwise  disjoint  sets.  The  proof  of  this  claim  uses  the 
celebrated  theorem  of  Hajnal  and  Szemeredi  [60]  introduced  in  Section  4.2.2. 

Lemma  9.5.  Let  S  be  a  set  of  at  least  00  permutations  on  [n],  and  let  Qb  be  a  set  of  at 
most  yjy-  balanced  queries.  Then  there  exists  a  partition  SjU  •  •  •  US),  of  the  permutations 
in  S  such  that  for  i  —  1,  2, . . . ,  k, 

(i)  15)1  >  2n /20,  and 

(ii)  The  sets  are  pairwise  disjoint. 


Proof.  Construct  a  graph  G  on  5  where  two  permutations  o,  r  are  adjacent  iff  there  exist 
u,v  G  Qb  such  that  o(u)  =  t(v).  By  this  construction,  when  T  is  a  set  of  permutations 
that  form  an  independent  set  in  G,  then  are  pairwise  disjoint. 

Consider  a  fixed  permutation  o  e  5.  A  second  permutation  r  is  adjacent  to  a  in  G  iff 
there  are  two  vectors  u,  v  in  Qb  such  that  the  permutation  t<j~1  maps  the  indices  where  u 
has  value  1  to  the  indices  where  v  has  value  1  as  well.  There  are  (^)  <  (y^,)2  ways  to 
choose  u,  v  G  Qb  and  at  most  |w|!(n  —  |w|)!  ways  to  satisfy  the  mapping  condition,  so  the 
graph  has  degree  at  most 


max  (yyy)2  •  k\  in  —  k)\ 

n  /2n  MUU/  v  7 


n  \  2  n\ 

i“')  'ey 


i 


for  a  constant  c  =  1  —  H2{\)  —  o(l)  >  0.07.1  Therefore,  by  the  Hajnal-Szemeredi 

Theorem,  G  can  be  colored  with  n\ /20-07"  colors,  with  each  color  class  having  size  at  least 

n!/2"/ so  _  on/20  rn 

n!/2»-0Tn  —  z 

1  H2{p)  represents  the  binary  entropy  of  p.  H2{^)  «  0.918. 
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Lemma  9.5  is  useful  because  most  functions  /  have  low  discrepancy  on  large  pairwise 
disjoint  sets. 

Lemma  9.6.  Fix  Qb  to  be  a  set  of  ^  balanced  queries  and  fix  r  G  {0, 1}  iso.  Let  S  be  a 
fixed  set  of  at  least  2  20  permutations  such  that  the  sets  {^{Qb^-nes  are  pairwise  disjoint. 
Then 

Pr  Pr  [fn(Qb)  =  r]  -  >  f  •  2”^o  <  2^n(2"/25). 

/  [  ^es  v  3 

Proof  For  every  function  /  :  {0, l}n  — *  {0,1}  and  every  permutation  7r  of  [n] ,  define  the 
indicator  random  variable 


Xf,n  = 


1  if  fw(Qb)  =  r, 

0  otherwise. 


When  /  is  chosen  uniformly  at  random  from  the  set  of  all  boolean  functions  {0,  l}n  — > 

{0,  l},Ef[Xf,v]  =  Pr  f[U(Qb)  =  r]  =  2~&,  so 

E  [ Pr  [ MQb)  =  r]l  =  ^  E  ffe]  =  2“™. 

f  l^s  J  \b\nGSf 


Furthermore,  the  pairwise  disjointness  property  of  S  guarantees  that  the  indicator  vari¬ 
ables  Xf:7T  are  pairwise  independent.  Therefore,  by  Chemoff’s  bound, 

Pr  [  Pr \UQb)  =  r]  -  >  .'  •  2pJ  <  e-n(lsl2'n/1°°).  □ 

/  [  ^eS  v  3 

The  proof  of  Lemma  9.4  can  now  be  completed  as  follows. 

Lemma  9.4  (Restated).  When  f  is  drawn  uniformly  at  random  from  the  set  of  functions 
{0, 1}-  — >■  {0, 1}, 

Pr  [A(/)  >!■  2-i3°]  <2-n(2”/25). 

Proof  Fix  a  permutation  n  and  a  set  Qu  of  ^  unbalanced  queries  such  that  \Uf(fn,Qu)\  > 
Let  S  =  II  f(/7r,  Qu),  and  fix  a  set  Qb  of  ^  balanced  queries. 

By  Lemma  9.5,  there  exists  a  partition  510  •  •  •  U Sk  of  S  such  that  for  each  part  Su 
|5j|  >  2n/20  and  are  pairwise  disjoint.  By  Lemma  9.6,  for  every  set  S,  in  the 

partition, 

Pr  Pr  [fn(Qb)  =  r ]  —  2  'wo  >  |  ■  2_iSo  <  2_f42n/25). 

/  TTSSi 
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Taking  the  union  bound  over  all  k  <  n\  sets  St,  we  get  that 


Pr 

/ 


Pr  \fn(Qb)  =  r]  -  2  100 

7 TGo 


1  —  n 
>  i  .  2  ioo 


<  n 


I  _  2-n(2n/25) 


Applying  a  union  bound  once  again,  this  time  over  all  UiJ  <  choices  of  Qb  and 
2  iSo  choices  for  r,  we  obtain 

Pr  [Ag(/)  >  |  ■  2  J  <  2^o  +  i^o  •  n\  •  2~n^n/25\ 


Finally,  applying  the  union  bound  one  last  time  over  the  n\  choices  for  n  and  <  2 

choices  for  Qu,  we  get 


n2. 
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Pr  [A (/)  >  |  •  2-iSo]  <  2w+ino  .  n!2  •  2"n(2n/25)  = 


□ 


9.3  Notes  and  Discussion 

Testing  isomorphism  to  juntas.  By  combining  the  result  in  this  chapter  with  indepen¬ 
dent  and  overlapping  results  of  Chakraborty,  Garcia  Soriano,  and  Matsliah  [40],  we  obtain 
a  generalization  of  Theorem  9.1.  Specifically,  we  obtain  a  lower  bound  showing  that 
for  almost  all  functions  /  that  are  A; -juntas,  testing  /-isomorphism  requires  at  least  Vt(k) 
queries.  The  details  of  this  result  are  found  in  [4]. 

One  implication  of  the  generalized  statement  is  that  the  upper  bound  of  0(k  log  k) 
queries  in  Theorem  8.1  on  the  query  complexity  for  testing  isomorphism  to  (n  —  k)- 
symmetric  functions  cannot  be  improved  by  more  than  a  logarithmic  factor  since  there 
are  (many)  (n  —  A')-symmctric  functions  that  require  Q(A')  queries  to  test. 


Optimality  of  the  lower  bound.  There  is  a  logarithmic  gap  between  the  lower  bound  of 
Theorem  9.1  and  the  universal  upper  bound  described  at  the  beginning  of  the  last  chapter. 
It  is  quite  possible  that  the  gap  is  only  a  byproduct  of  the  limitations  of  our  proof  argument 
and  that  ©(nlogn)  queries  are  indeed  required  to  test  /-isomorphism  for  almost  every 
function  /  :  (0,  l}n  — >  (0, 1}.  That  problem  remains  open. 


Testing  isomorphism  to  linear  functions.  The  result  in  this  chapter  is  non-constructive: 
while  it  states  that  for  almost  all  functions  /,  testing  /-isomorphism  requires  at  least  O(n) 
queries,  it  does  not  identify  any  concrete  functions  for  which  this  lower  bound  applies. 
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Our  result  in  Chapter  7  provides  an  example  of  a  concrete  function  for  which  the  same 
lower  bound  applies.  Consider  f(x)  =  x\  ©  •  •  •  ©  xr.  The  set  of  functions  that  are 
isomorphic  to  /  is  exactly  the  set  of  /-linear  functions.  Therefore,  Theorem  7.1  implies 
that  testing  /-isomorphism  in  this  case  requires  I  -  o(n)  queries. 


100 


Chapter  10 

Testing  Isomorphism  to  Juntas 


The  first  two  chapters  of  this  part  of  the  thesis  gave  some  partial  results  related  to  the  char¬ 
acterization  of  the  set  of  functions  /  for  which  it  is  possible  to  test  /-isomorphism  with  a 
constant  number  of  queries.  Chapter  8’s  main  result  was  a  sufficient  condition  for  efficient 
isomorphism-testability:  all  partially  symmetric  functions  are  efficiently  isomorphism- 
testable. 

The  main  result  of  Chapter  9  gave  a  strong  lower  bound  in  the  sense  that  the  set  of 
functions  that  are  efficiently  isomorphism-testable  (or,  indeed,  that  can  be  tested  with  o(n) 
queries)  contains  only  a  o(l)  fraction  of  all  boolean  functions.  Being  non-constructive, 
however,  that  result  did  not  identify  specific  characteristics  of  the  set  of  functions  that 
are  not  efficiently  isomorphism-testable.  The  goal  of  this  chapter  is  to  establish  such  a 
characteristic.  In  other  words,  we  want  to  identify  a  concrete  class  of  functions  that  are 
not  efficiently  isomorphism-testable. 

As  we  mentioned  at  the  beginning  of  the  last  chapter,  until  recently  the  only  class 
of  functions  that  were  known  to  not  be  efficiently  isomorphism-testable  was  the  set  of  k- 
linear  functions  for  cu(l)  <k<  o(y/n)  [52].  Fischer  et  al.  conjectured  that  these  functions 
were  all  contained  in  a  much  larger  class  of  functions  that  are  not  efficiently  isomorphism- 
testable.  Specifically,  they  conjectured  that  if  n  is  sufficiently  large  compared  to  k  and 
/  :  (0,  l}n  — >■  (0, 1}  is  a  k -junta  that  is  e-far  from  all  ( k  —  l)-juntas,  then  any  e-tester  for 
/-isomorphism  requires  a  number  of  queries  that  depends  on  k. 

The  main  result  of  this  chapter  confirms  Fischer  et  al.’s  conjecture.  In  fact,  we  do 
more:  we  show  that  for  every  function  /  that  is  a  A; -junta  and  is  far  from  all  ( k  —  e)- 
juntas  for  some  tu(l)  <  k  <  n  —  ce(l)  and  e  =  o(y/k),  testing  /-isomorphism  requires  a 
super-constant  number  of  queries. 
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Theorem  10.1.  Let  g  :  {0, 1}"  — >  {0, 1}  be  a  k-junta  which  is  e-farfrom  being  a  (k  —  e)- 
junta  for  some  e  >  1.  Then  any  non-adaptive  e-tester  for  g-isomorphism  must  make  at 
least  \og2{k' / e2)  —  0(1)  queries,  where  k!  =  min (k,n  —  k). 

The  rest  of  this  chapter  is  dedicated  to  proving  this  theorem.  On  first  reading,  the 
reader  is  encouraged  to  focus  on  the  simplest  case,  where  e  =  1. 


10.1  Two  distributions  on  functions 

We  prove  Theorem  10.1  with  Yao’s  Minimax  Principle  [100]  via  Lemma  3.15.  To  do 
so,  we  must  introduce  distributions  Dyes  and  VQO  on  functions  that  are  isomorphic  to  / 
and  e-far  from  isomorphic  to  /,  respectively,  for  some  fixed  function  /  that  satisfies  the 
conditions  of  the  theorem. 

The  main  challenge  in  the  construction  of  our  distributions  is  that  it  must  apply  to  a 
large  class  of  functions.  That  is,  we  cannot  use  any  other  structural  property  of  /  except 
that  it  is  a  A; -junta  and  is  far  from  ( k  —  e)-juntas.  In  fact,  this  restriction  suggests  a  natural 
way  to  define  Vyes  and  Vno. 

In  the  following,  fix  0  <  e  <  k  <  n  and  let  /  be  a  A; -junta  that  is  also  e-far  from  all 
{k— e)-juntas.  Without  loss  of  generality,  we  may  assume  that  the  k  relevant  coordinates  of 
/  are  {1,  2, ...  r  A;}.  Let  /core  :  {0,  l}fc  — *  {0, 1}  be  the  restriction  of  /  to  these  coordinates. 

The  distribution  Vyes  is  defined  in  the  most  natural  way,  by  randomly  embedding  /core 
into  [n] .  More  precisely,  to  draw  g  ~  Vye s,  we  first  draw  a  random  subset  J  C  [n] 
uniformly  at  random  from  all  subsets  of  [n]  of  size  k.  We  then  draw  a  random  bijection 
cr  :  [A;]  — >■  J  uniformly  at  random.  Finally,  we  define  g(x)  :=  /COre(^cr(i),  •  •  •  ,  avpc)).  This 
construction  guarantees  that  every  function  g  in  the  support  of  Vyes  is  isomorphic  to  /. 

The  distribution  Vno  is  defined  in  a  similar  way.  To  draw  g  r^j  Vno,  we  begin  by 
drawing  a  set  J  C  [n]  uniformly  at  random  from  all  subsets  of  [n]  of  size  k  —  e.  We  then 
draw  a  random  map  a  :  [k\  — >  J  uniformly  at  random  among  all  the  maps  that  satisfy  the 
following  conditions: 

1.  There  exists  a  single  element  j*  e  J  such  that  \{i  e  [A;]  :  cr{i)  =  j*}\  =  e  +  1. 

2.  For  every  j  e  J\  {j*},  |{i  G  [A;]  :  o(f)  =  j}\  =  1. 

We  then  define  g(x)  =  /Core(^<r(i),  •  •  •  ?  xa(kj).  This  construction  guarantees  that  every 
function  g  in  the  support  of  Vno  is  a  (A;  —  e)-junta.  As  a  result,  every  such  g  is  e-far  from 
isomorphic  to  /. 
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To  complete  the  proof  of  Theorem  10.1,  we  show  below  that  for  any  set  of  q  = 
\og(k' /e2)  —  0(1)  queries,  the  distributions  on  the  responses  obtained  from  functions 
drawn  from  Vyes  or  from  T>no  are  very  similar. 


10.2  Distance  between  multivariate  hypergeometrics 

The  typical  way  to  prove  a  property  testing  bound  such  as  Theorem  10.1  is  as  follows. 
First,  we  write  the  q  queries  of  tester  T  as  x1, ...  ,xq  G  {0,  l}n.  We  then  introduce  the 
response  vector  random  variables  Ryes  and  Rno.  Here  Ryes  G  {0,  l}q  is  defined  by  drawing 
/yes  ~  R yes  and  letting  Ryes  =  {fyes(xv), . . . ,  fyes(xq)),  and  Rno  is  defined  analogously. 
Finally,  we  show  that 

dTV(i?yeS,  Rno)  <  2q  ■  +  .01.  (10.1) 

We  will  in  fact  prove  a  stronger  statement.  To  understand  it,  let’s  reconsider  the  com¬ 
plete  random  processes  Pyes  and  Pno  by  which  the  response  vectors  Ryes  and  Rno  are 
generated.  We  begin  by  focusing  on  the  “yes”  process,  Pyes. 

Given  the  tester  T’s  queries  x1, . . . ,  xq  G  {0,  l}n,  we  think  of  them  as  row  vectors 
and  arrange  them  into  a  q  x  n  query  matrix  Q.  We  will  be  especially  interested  in  the 
columns  of  this  matrix  Q,  the  jth  column  consisting  of  the  yth  bits  of  all  the  query  strings. 
Abstractly,  we  define  the  set  of  all  possible  column  (types) 

C=  {0,1}9. 

Since  |(£|  =  2q  <C  n,  some  columns  will  occur  many  times  in  the  matrix  Q.  In  fact,  we 
will  think  of  the  query  matrix  Q  as  being  an  ordered  multiset  of  columns  from  tf . 

Recalling  the  definition  of  J-yes,  we  think  of  the  first  step  of  Pyes  as  choosing  k  column 
indices  ji,  ■  ■  ■  ,jk  randomly  and  without  replacement  from  [n] .  We  next  extract  columns 
ji, . . . ,  jk  from  0.  We  view  this  as  a  multiset  of  columns,  and  call  it  the  argument  multiset 
S'yes.  Next,  we  randomly  order  the  columns  in  Syes,  forming  a  q  x  k  argument  matrix  /lyes. 
Finally,  we  produce  the  response  vector  Ryes  by  applying  gcore  to  the  argument  matrix, 
row-wise. 

The  reader  can  easily  verify  this  process  Pyes  generates  the  correct  distribution  on  the 
response  vector  random  variable  Ryes- 

The  “no”  process  Pno  is  very  similar,  differing  only  in  the  way  it  generates  the  argu¬ 
ment  multiset  from  the  query  matrix.  Recalling  the  definition  of  Pno,  we  think  of  Pno 
as  forming  the  argument  multiset  Sno  by  choosing  £  —  k  —  e  random  columns  from  0 
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without  replacement,  and  including  an  additional  e  copies  of  the  first-chosen  column.  The 
process  Pno  then  forms  the  argument  matrix  /lno  by  again  randomly  ordering  the  columns 
in  the  argument  multiset,  and  finally  produces  the  response  rector  Rno  again  by  applying 
Qcore  to  Ano,  row-wise.  The  reader  can  again  easily  verify  that  Pno  generates  the  correct 
distribution  on  Rno. 

Because  the  processes  are  identical  after  the  argument  multiset  is  formed,  a  coupling 
argument  immediately  implies  that 

Tty  (-Ryes,  -Rno)  <  dTV  ('S'yes,  ‘S'no)  •  (10.2) 


This  inequality  can  be  extremely  lossy,  depending  on  the  function  gc ore.  However,  since 
Theorem  10.1  applies  for  an  extremely  broad  range  of  functions,  we  are  almost  forced  to 
design  a  proof  of  Theorem  10.1  that  uses  no  properties  of  the  function  gCOTe.  That  is,  in  the 
absence  of  additional  restrictions  on  the  class  of  functions  considered,  there  is  no  obvious 

way  to  bound  dTV(-Ryes,  Rno)  except  by  dTV(.5'yes,  Sno). 

Letting  Syes  denote  the  subprocess  of  Pyes  generating  Syes,  and  similarly  for  Sno,  we 
have  reduced  proving  (10.1),  and  hence  Theorem  10.1,  to  the  following: 


Theorem  10.2.  For  Syes  ~  <Syes,  Sno  ~  <Sno,  we  have 


d-Tv(Syes,  Sno)  <  |(£|  • 


Q(e2) 

min  (A:,  n  —  k ) 


+  .01. 


The  reader  can  see  now  why  our  query  complexity  lower  bound  in  Theorem  10.1  is 
only  logarithmic;  we  have  £  =  2g  competing  against  j  in  the  above  bound.  Indeed,  we 
can  never  prove  a  better-than-logarithmic  lower  bound  if  our  proof  only  involves  showing 
statistical  closeness  of  the  argument  multisets  Syes  and  ,S'no.  To  see  this,  suppose  k  =  nj 2, 
so  n  —  k  =  n/2  as  well.  Then  if  2q  +>  n/2,  it  is  possible  that  every  column  in  the  query 
matrix  is  unique.  In  this  case,  the  total  variation  distance  between  argument  multisets 
S'yes  and  S'no  will  be  1  even  in  the  case  e  =  1,  because  Syes  will  always  consist  of  unique 
columns,  whereas  S'no  will  always  have  one  column  duplicated. 

Notice  that  the  ordering  of  the  columns  in  the  query  matrix  Q  has  proven  to  be  unim¬ 
portant;  we  can  think  of  Q  simply  as  an  unordered  multiset  of  columns  from  <£.  Thus 
Theorem  10.2  is  really  a  statement  about  the  total  variation  distance  between  certain  mul¬ 
tivariate  hypergeometric  random  variables.  Specifically,  for  each  column  c  6  C,  let  m(c) 
denote  the  number  of  copies  of  c  in  Q.  In  process  Syes,  we  choose  k  random  columns  from 
Q  without  replacement  and  count  the  number  of  copies  of  each  column  (type)  in  the  draw. 
Process  <Sno  is  similar,  except  we  choose  i  random  columns  from  0  without  replacement, 
and  count  an  extra  e  copies  of  the  first-drawn  column. 


104 


10.2.1  Reduction  of  Theorem  10.2  to  two  lemmas 


This  preceding  discussion  motivates  the  following  notation: 

Definition  10.3.  Given  integers  N,e  >  1,  M,  L  >  0,  with  M,  L  +  e  <  N,  we  define 

^n,m,l(z)  —  dTv(X,  Y ),  where  X  ~  'Hn,m,l+c  and  Y  ~  TLn  ,m,l  +  e. 

The  proof  of  Theorem  10.2  relies  on  the  following  two  lemmas.  The  first  lemma 
is  relatively  straightforward,  and  relates  the  distance  between  Syes  and  SQO  to  the  total 
variation  distance  between  hypergeometric  distributions. 

Lemma  10.4. 


dTv(*Syes>  5*no)  < 


m(  c) 


V 

^  n 

cSC:m(c)^0 


An—  l,ra(c)— 1,£— 1  (®)  • 


The  second  lemma  is  a  total  variation  distance  bound  between  (univariate)  hypergeo¬ 
metric  random  variables  which  may  be  of  independent  interest. 

Lemma  10.5.  There  is  a  universal  constant  2  <  n  <  oo  such  that  for  any  N,M,L,  if 
L'  =  min(L,  N  —  L)  satisfies  YX  >  Ke2,  then  A N,M,i(e)  —  -01- 

We  briefly  comment  on  why  the  hypothesis  S>  e2  is  necessary  to  show  thatTLN,M,L+e 
and  Ti n,m,l+c  arc  close  in  total  variation  distance.  For  simplicity,  first  suppose  that  e  =  1. 

It  is  necessary  that  that  AfA  l;  this  quantity  is  the  mean  of  TLn,m,l,  and  if  it  is  <C  1 
then  X  ~  TLn,m,l+ i  1S  likely  to  be  0  whereas  Y  ~  TLn,m,l  +  1  is  at  least  1.  Second,  it  is 
also  necessary  that  MiXzfil  —  m(1  —  S>  1.  To  see  this,  note  that  if  by  way  of  contrast 
1  —  -C  Y,  then  X  is  concentrated  at  M  and  Y  is  concentrated  at  M  +  1.  Finally,  to 
understand  the  hypothesis’s  dependence  on  e,  suppose  M  =  N/2  and  L  is  quite  small. 
Then  TLn.m.l  is  distributed  very  much  like  BinfL,  e);  hence  we  require  L  'Ji>  e2  or  else  the 
extra  +e  in  Y  will  dominate  the  standard  deviation  of  IiinfL,  e). 

We  prove  Lemmas  10.4  and  10.5  in  the  next  sections,  but  first  we  show  how  Theo¬ 
rem  10.2  follows  from  the  lemmas. 


Proof  of  Theorem  10.2.  Note  that  we  may  freely  assume  k  >  2e  +  2,  as  otherwise  the 
bound  we  are  trying  to  prove  exceeds  1  (assuming  the  constant  in  the  O(-)  is  large  enough). 
Let  us  introduce  the  notation  N  =  n—  1,  M(c)  =  m(c)  —  1,  L  =  i—1 ,  L'  =  min(L,  N— L). 
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Then  by  Lemma  10.4, 


d.Tv(‘5yes,  i5no)  G  ^  ^  ^n— l,m(c)— 1,1—1  (e) 


cG<£:m(c)^0 


V  ^4  .  A„.f(,)x(e) 


n<-  M(c)  Ke2 
U-  N  <  L' 


n 


\  ^  mic)  a  ( \ 

/  J  -  ■  AN,M{c),L{e) 


M(c)  Ke^ 
N  -IT 


n 


<  V  +  .  .01, 

—  /  j  n  /  j  n 


M(c)  .  /^e-^  M(c)  ^  /te^ 

U—  N  ^  L'  N  —  T.' 


N  —  L' 


where  the  last  inequality  uses  Lemma  10.5.  Since  Xlcec  m(c)  =  the  second  sum  above 
is  at  most  .01.  Thus  it  remains  to  bound  the  first  sum  by  C I  .  °/'  ]  , , .  There  are  at  most 

J  1  1  min  (k,n—k) 

£  summands  in  this  first  sum,  and  for  each  we  have 

m{ c)  ^  M(c)  +  1  ^  ne2  |  1  ^  2ne 2 

~T  -  n  -  1 7  +  n~  ~1T 

by  the  condition  of  the  sum. 

To  complete  the  proof,  it  remains  to  show  that  L'  =  min(f  —  1,  n  —  £)  >  fl(min(/c,  n  — 
k )).  When  £  —  1  <  n  —  £,  then  L'  —  £  —  1  —  k  —  e  —  1  >  k/ 2  by  the  fact  that  k  >  2e  +  2. 
And  when  n  —  £  <  £  —  1,  then  L'  =  n  —  £  =  n  —  k  +  e  >  n  —  k.  so  L'  >  |  min (k,  n  —  k), 
as  we  wanted  to  show.  □ 


10.2.2  Proof  of  Lemma  10.4 

Let  us  think  of  the  experiment  Syes  in  an  alternate  way.  We  begin  by  choosing  a  first 
column  from  0  for  Syes  —  call  it  C\.  We  next  decide  how  many  additional  copies  of  Gi 
to  include  into  Syes.  Call  this  quantity  T.  We  have 

T  |  {C \  C)  ~  'Hn—\tm(c)—l,k—l- 

(Note  that  m( c)  —  1  >  0  always,  because  c  won’t  be  chosen  if  m(c)  =  0.)  So  far,  Syes 
consists  of  T+l  copies  of  C\ .  Finally,  we  complete  the  draw  of  Syes  by  choosing  k— (T+l) 
columns  without  replacement  from  “0  \  C{\  meaning  the  multiset  of  columns  formed 
from  Q  by  removing  all  copies  of  C\. 
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We  think  of  the  experiment  <Sno  in  a  similar  way.  Again,  we  begin  by  choosing  a  first 
column  C\  from  0  for  Sno.  We  next  determine  how  many  additional  copies  of  C\  there 
will  be  from  among  the  remaining  £  —  1  choices.  Calling  this  quantity  U,  we  have 

U  |  (Ci  C)  ~  T~Ln—  l,m(c)— 1,£—1  ■ 

Recall,  however,  that  in  Sno,  we  include  an  additional  e  copies  of  C\  into  Sno.  Hence  Sno 
ends  up  with  U  +  e  +  1  copies  of  C\.  Finally,  we  complete  Sno  by  adding  £  —  (U  +  1) 
columns  drawn  without  replacement  from  Q  \  C\ . 

Let  V  —  U  +  e.  We  claim  that  by  coupling  the  random  variables  T  \  (C\  =  c)  and 
V  |  (Ci  =  c),  we  couple  Syes  and  Sno.  This  follows  immediately  from  the  two  descriptions, 
as  then  T+l  — V  +  l  — U  +  e  +  1,  and  k  —  (T  +  1)  =  £  +  e  —  (V  +  1)  =  i  —  (U  +  1). 
Hence 

dTV(5'yes,  Sno)  <  Pr[Ci  =  c]  •  dTV(T  |  (C,  =  c),  V  |  (C,  =  c)). 
cec 


On  one  hand,  Pr[Ci  =  c]  is  simply  On  the  other  hand,  we  have 

f  |  (Ci  C)  ~  'Hn—  l,m(c)— l,t+e— 1 1 

i  I  (dd  c)  ~  'hLn—  l,m(c)— 1,£—  1  T  e. 


So  by  definition,  d tv(P  |  (Ci  =  c),  V  |  (Ci  =  c))  =  An_iim(c)_i^_i(e),  and  hence 
dTv(*S'yes, ‘S'no)  <  E  t  dn— l,m(c)— 1,£— 1  (^)  j 


cGC:m(c)^0 


as  claimed. 


10.2.3  Proof  of  Lemma  10.5 


Recall  that  L'  =  min(L,  N  —  L), 


ML' 

~W~ 


>  K,e2, 


(10.3) 


and  our  goal  is  to  bound  \n,m,l(c)  =  dTv(df,  Y)  <  .01,  where  X  ~  1~LN,M,L+e  and 
Y  ~  'Hn,m,l  +  e- 


We  begin  by  coupling  X  and  Y,  as  follows.  Imagine  drawing  balls  randomly  and 
without  replacement  from  an  urn  containing  N  balls,  M  of  which  are  white.  We  draw 
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L  +  e  balls  from  the  urn.  We  let  X  be  the  number  of  white  balls  among  all  balls  drawn;  we 
let  Y  be  the  number  of  white  balls  among  the  first  L  balls  drawn,  plus  e.  Note  that  X  <  Y 
always  under  this  coupling. 

Let  us  now  compare  the  probability  mass  functions  of  X  and  Y.  The  integers  u  <  e 
can  be  in  X’s  range  but  not  Y’ s;  the  integers  u  >  min (M,  L  -\-  e)  can  be  in  Y' s  range  but 
not  X’s.  The  remaining  integers  are  in  the  range  of  both  X  and  Y,  and  we  have 

Pr[*=M]  (")  XX)  / 

Pr[V"  —  u]  (4J  /  (") 

(,“ ) '  (A) 

_  (M—u+e)(M—u+e—l)---(M—u+l)  (^) 

u(u- l)-(u-e+l)  (L+e)  ' 

Evidently  (and  unsurprisingly),  this  ratio  is  a  decreasing  function  of  u.  Letting  t  be  the 
largest  integer  for  which  the  ratio  is  at  least  1,  we  conclude  that 

Prpf  —  u]  >  Pr[Y  =  v\  iff  u  <  t . 


It  follows  immediately  that 

dTV(X,  Y)  =  Pr[X  <  t]  -  Pr[Y  <  t]. 


But  by  our  coupling, 

Pr[X  <t]~  Pr [Y  <t]  =  Pr[X  <tC\Y>t] 

-  Pr[X  >  t  n  Y  <  t] 

=  Pr[X  <  t  n  Y  >  t]r 

since  X  <  Y  always.  Our  goal,  then,  is  to  bound 

dTV(X,  Y)  =  Pr[X  <tDY>t\.  (10.4) 

We  will  in  fact  prove  something  slightly  stronger:  we  will  show  that  for  any  value  of  t,  the 
right-hand  side  of  (10.4)  is  small. 

To  analyze  (10.4)  we  recall  the  ball  and  um  process  defining  X  and  Y.  Having  drawn 
L  +  e  balls,  let  W  be  the  number  of  white  balls  among  the  last  e  balls  drawn,  and  let  Z  be 
the  number  of  white  balls  among  the  first  L.  Thus  X  —  W  +  Z  and  Y  —  e  +  Z.  As  a  first 
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observation,  we  may  note  that  if  W  =  e  then  X  =  Y  and  hence  the  event  in  (10.4)  does 
not  occur.  I.e., 

dTV(X,  Y)  <  Pr[VF  ^  e]  <  e(l  -  §),  (10.5) 

where  we  used  a  union  bound  over  each  of  the  last  e  balls  being  non-white.  Now  by  (10.3), 


l  MU  U  p— 

e<\ - —  <  \  -  <  .001  V^V,  (10.6) 

V  hi  N  V  hi 

if  we  assume  k  large  enough.  It  follows  that  we  may  additionally  assume 

M<N~.01Vn  1-  — (10.7) 

N  ~  y/N 

because  otherwise  the  bound  in  (10.5)  is  at  most  .001  \fN ■  4=  =  .00001,  which  establishes 
the  theorem  with  room  to  spare.  We  also  use  this  opportunity  to  mention  that 

M  >  2e,  L'  >  2e,  (and  hence  certainly  N  >  2e)  (10.8) 

follow  easily  from  (10.3). 

We  next  give  a  more  refined  upper  bound  on  (10.4).  By  conditioning  on  W  we  have 

dTv(X,  Y )  =  Pr[X  <  t  fl  Y  >  t] 

e—1 

=  pr[tP  =  *]  Pr[f  -e<  Z  <t-i\W  =  i}. 

i= 0 

Now  Z  |  (' W  =  i)  has  distribution  7(jv_e,M-i,L  (and  n°te  that  M  —  i>M  —  e>  0 
by  (10.8)).  Let  us  write  cr2  =  L(1  -  -  f^).  Applying  Corollary  ??  and  a 

union  bound  we  get 

e—1  s-i 

dTV(X,  Y)  <  yPr[W  =  i]-(e-<)- 

z '  <7 

i=0 

<  max  |  —  |  •  Yj  Pr[hP  —  i](e  —  i) 

=  max  1  —  1  •  Efe  —  W}. 

0 <i<e  (  a  ) 
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We  have  W  ~  W-nm ,e>  and  thus  E[e  —  W]  =  e(l  —  4f).  And  by  definition, 


\C] 

1 

max  < 

>  =  max  < 

0  <i<e 

0<i<e 

C 


< 


L(  l-v^)(fEf)(l 


M  ^ 
Ar— e  > 


Thus  we  have  established 


dTv(x,y)  < 


(10.9) 


We  will  bound  the  three  fractions  in  (10.9)  one  at  a  time.  We  begin  with  the  middle 
one.  Note  first  that 


d  |  1  “  f  1  _  _  N  —  2e  —  M 

dM  \^ -  ■£-,)  2A ryW^(JV  -  e  -  M)' 

By  combining  (10.6)  and  (10.7)  we  get  M  <  N  —  lOe  <  N  —  2e.  Hence  the  derivative 
above  is  always  negative,  implying  that  (1  —  %)/ \J  1  —  is  a  decreasing  function  of 
M  on  M’s  range.  Hence  we  may  upper-bound  this  fraction  by  taking  M  =  0,  giving  an 
upper  bound  of  1.  Substituting  this  into  (10.9)  gives 

Ce  1 

dTv(X,  Y)  <  •  (10.10) 

v/id  -  tA) 

We  next  examine  the  fraction  on  the  right.  It  is  at  most 


1 


< 


where  we  used  (10.8).  By  virtue  of  (10.3),  we  can  upper-bound  this  by 


Substi- 

e 


no 


tuting  this  upper  bound  into  (10.10)  yields 


dTV(x,y)  <  c 


<  .001 


(10.11) 


assuming  k  is  sufficiently  large  compared  with  C. 

Finally,  we  split  into  two  cases,  depending  on  whether  L  <  N/2.  If  indeed  L  <  N/2, 
then  L'  —  L  and  we  have 


.001 


U 


N-e> 


But  N  —  e  >  N  —  .OOlv^V  >  (2/3) iV  (using  (10.6)  and  N  >  2  from  (10.8)),  so  we 
upper-bound 

dTV(x,y)  <  .  •°01  =  .002  <  .oi, 

/l  _  N/2 

V  (2/3)  JV 

as  needed.  The  second  case  is  that  L  >  N / 2,  in  which  case  L  =  N  —  II  and  the  bound 
in  (10.11)  is 


.001, 


(JV  -  £')(  1 


N-L 

N-e 


77  =  -001, 


(N~U)%E f 


=  .001 


N-e 


U  -  e  V  N-U 


But  using  (10.8), 


and  using  L'  <  N/2, 


U-e 


< 


U  /— 

Uj2~^ 


N-e 

N-L' 


< 


N 


N-L ’ 


< 


N 

N/2 


=  V2. 


(10.12) 


Hence  the  upper  bound  (10.12)  on  dxv(A^,  Y)  is  at  most  .OOlv^V^  <  .01,  as  needed. 
This  completes  the  proof  of  Lemma  10.5. 
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10.3  Notes  and  Discussion 


Majority  functions.  The  lower  bound  in  this  chapter  applies  to  most  A; -juntas  that  we 
consider  to  “strongly  depend”  on  all  k  of  their  relevant  variables.  One  notable  exception 
to  this  observation,  however,  is  the  class  of  majority  functions  on  exactly  k  variables.  This 
function  is  symmetric,  so  intuitively  it  “strongly  depends”  on  all  k  of  its  relevant  variables. 
Majority  functions  on  k  variables,  however,  are  o(l)-close  to  Majority  functions  on  k  —  e 
variables  for  any  e  =  o(k).  As  a  result,  the  lower  bound  of  this  chapter  does  not  apply  to 
the  problem  of  testing  isomorphism  to  the  k- majority  functions. 

As  a  result  of  this  perceived  gap,  in  [25]  we  also  proved  a  lower  bound  for  testing  iso¬ 
morphism  to  ^-majority  functions.  In  fact,  the  bound  obtained  in  this  problem  is  stronger 
than  the  one  in  this  chapter  and  shows  that  poly  (A;)  queries  are  required  to  non- adaptively 
test  isomorphism  to  A- majority  functions. 


Generalizing  the  lower  bound.  At  first  glance,  the  reader  may  wonder  why  we  require 
the  two  conditions  that  /  be  a  A; -junta  and  that  /  be  far  from  ( k  —  e) -juntas  for  some  small 
e  in  the  statement  of  Theorem  10.1.  A  natural  question  to  ask  is  whether  the  theorem  can 
be  generalized  by  removing  one  of  these  two  conditions.  It  cannot.  To  see  this,  consider 
the  constant  function  and  the  parity  function.  The  constant  function  is  a  A; -junta  for  any 
0  <  k  <  n  and  the  parity  function  is  far  from  all  ( k  —  e)-juntas  for  any  k  —  e  <  n.  But 
both  functions  are  symmetric  so  testing  isomorphism  to  either  of  them  can  be  done  with  a 
constant  number  of  queries. 

It  is  possible,  however,  that  Theorem  10.1  may  be  generalized  by  establishing  a  lower 
bound  on  the  query  complexity  for  testing  isomorphism  to  all  functions  that  are  far  from  all 
k-symmetric  functions.  In  [26],  we  conjecture  that  this  is  indeed  the  case,  but  the  problem 
remains  open.  Note  that  if  this  result  were  established,  that  lower  bound  combined  with 
the  result  presented  in  Chapter  8  would  essentially  characterize  the  set  of  functions  that 
are  efficiently  isomorphism-testable. 
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Part  III 
Connections 
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Chapter  11 

Communication  Complexity 


Communication  complexity  is  an  area  of  theoretical  computer  science  that  has  developed 
techniques  which  have  been  spectacularly  effective  in  proving  lower  bounds  in  a  large 
variety  of  other  areas  of  computer  science.  In  this  chapter,  we  will  explore  how  communi¬ 
cation  complexity  can  also  be  used  to  prove  strong  lower  bounds  on  the  query  complexity 
of  different  property  testing  problems.  As  a  bonus,  we  will  also  see  how  the  resulting 
proofs  are  often  surprisingly  simple. 

The  basic  setup  in  communication  complexity  is  straight-forward.  We  have  two  play¬ 
ers,  Alice  and  Bob,  who  each  receive  some  input.  They  cannot  see  each  other’s  inputs, 
but  would  like  to  compute  some  function  on  their  joint  input.  For  example,  Alice  may 
receive  a  set  A  C  [n] ,  Bob  may  receive  a  set  B  C.  [n] ,  and  they  might  want  to  determine  if 
their  sets  intersect  or  if  the  two  sets  are  disjoint.  They  do  so  by  communicating  some  in¬ 
formation  to  each  other  (following  a  pre-determined  protocol),  until  they  have  the  desired 
answer.  The  main  goal  in  communication  is  to  determine  the  minimum  number  of  bits  that 
Alice  and  Bob  must  communicate  to  each  other  to  compute  the  answer. 

The  versatility  of  communication  complexity  in  establishing  lower  bounds  in  various 
areas  of  computer  science  stems  mainly  from  two  useful  properties.  The  first  is  that  there 
are  very  strong  lower  bounds  on  the  communication  complexity  required  to  compute  some 
basic  functions.  For  example,  it  is  known  that  even  when  Alice  and  Bob  run  a  randomized 
protocol  and  have  access  to  a  common  source  of  randomness,  solving  the  set  disjointness 
problem  mentioned  above  requires  £l(n)  bits  of  communication  [70,  86].  (In  other  words, 
Alice  and  Bob  cannot  do  asymptotically  better  than  just  exchanging  A  and  B  with  a  trivial 
communication  protocol.) 

The  second  reason  that  communication  complexity  is  so  versatile  is  that  there  are  often 
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natural  reductions  from  communication  complexity  to  problems  in  other  areas  of  computer 
science.  The  main  contribution  of  this  chapter  is  to  establish  such  a  reduction  for  property 
testing.  We  will  then  apply  this  reduction  to  establish  lower  bounds  on  the  query  com¬ 
plexity  for  three  different  property  testing  problems.  The  notes  section  at  the  end  of  the 
chapter  discusses  more  lower  bounds  we  can  obtain  with  this  method. 

For  the  first  result  in  this  chapter,  we  revisit  the  problem  of  testing  A  -l i nearity.  We  saw 
in  Chapter  7  that  min  (A;,  n  —  k}  ■  (1  —  o(l))  queries  are  required  to  test  A- linearity.  In  this 
chapter,  we  establish  a  slightly  less  precise  lower  bound  of  0(  min{A',  n  —  k })  queries.  The 
proof,  however,  ends  up  being  significantly  simpler. 

Theorem  11.1.  For  any  0  <  e  <  e-testing  k-linearity  requires  0(min{A',  n  —  k}) 
queries. 

The  second  problem  we  examine  is  testing  monotonicity.  For  this  problem,  we  con¬ 
sider  a  slightly  more  general  class  of  functions.  Fix  R  C  M.  Recall  that  the  function 
/  :  {0,  l}n  — >  R  is  monotone  if  for  any  x  ■<  y,  f(x )  <  f(y).  We  show  that  when  R  is 
large  enough,  testing  monotonicity  requires  a  large  number  of  queries. 

Theorem  11.2.  Fix  R  C  M.  For  any  0  <  e  <  |,  e-testing  the  function  f  :  {0,  l}n  — y  R 
for  monotonicity  requires  ft(nrin{n,  |7A|2})  queries. 

Finally,  the  third  result  we  establish  in  this  chapter  is  a  lower  bound  for  testing  whether 
a  function  can  be  computed  by  a  decision  tree  of  size  s.  We  saw  in  Theorem  7.10  that 
ft  (log  s)  queries  are  required  to  test  this  property.  The  next  result  shows  that  if  we  require 
the  tester  to  have  one-sided  error  (i.e.,  to  always  accept  functions  computable  by  sizc-.s 
decision  trees),  then  ft(s)  queries  are  required  for  the  task. 

Theorem  11.3.  For  any  0  <  e  <  | ,  at  least  ft(s)  queries  are  required  to  e-test  size-s 
decision  trees  with  one-sided  error. 


11.1  Communication  Complexity  Definitions 

This  section  contains  a  brief  introduction  to  the  communication  complexity  definitions  and 
results  that  will  be  used  in  the  proofs  in  the  remainder  of  the  chapter.  For  a  more  detailed 
introduction  to  communication  complexity,  we  highly  recommend  the  book  of  Kushilevitz 
and  Nisan  [76]  to  the  reader. 
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11.1.1  The  Model 


In  a  typical  communication  game,  there  are  two  parties — Alice,  who  receives  an  input  x, 
and  Bob,  who  receives  some  input  y.  Alice  and  Bob  wish  to  jointly  compute  some  function 
f(x,y)  of  their  inputs.  Neither  player  sees  all  the  information  needed  to  compute  /,  so 
they  must  communicate  together  to  solve  the  problem.  Communication  complexity  is  the 
study  of  how  much  communication  is  necessary  to  compute  /,  for  various  functions  /. 

A  protocol  is  a  distributed  algorithm  that  Alice  and  Bob  use  to  compute  f(x,y );  in 
particular,  it  specifies  what  messages  Alice  and  Bob  send  to  each  other.  In  a  deterministic 
protocol,  Alice’s  messages  are  a  function  only  of  her  input  x  and  the  previous  commu¬ 
nication  in  the  protocol.  Similarly,  Bob’s  messages  are  a  function  of  y  and  the  previous 
communication.  The  cost  of  a  protocol  is  the  maximum  (over  all  inputs)  number  of  bits 
sent  by  Alice  and  Bob.  The  deterministic  communication  complexity  of  /,  denoted  D(f), 
is  the  minimum  cost  of  a  deterministic  protocol  computing  /. 

In  a  randomized  protocol,  Alice  and  Bob  have  shared  access  to  a  (public  coin)  random 
string  r  £  {0,1}*.  We  say  that  P  is  an  5-error  protocol  for  /  if  for  any  input  pair  x,y,  P 
computes  f(x,  y)  with  probability  at  least  1  —  5,  where  the  probability  is  taken  over  the 
random  string  r.  We  use  R$(f )  to  denote  the  minimum  cost  of  an  e-error  protocol  for  / 
and  define  R(f)  Ri/s(f).  When  /  is  a  binary  function,  we  say  that  a  protocol  computes 
/  with  one-sided  error  if  there  exists  z  £  {0, 1}  such  that  P  computes  /  with  certainty 
whenever  f(x,y )  ^  z,  and  with  probability  at  least  1  —  5  when  f(x,y )  =  z.  When 
considering  randomized  protocols  with  one-sided  error,  it  is  important  to  note  which  “side” 
the  error  guarantee  is  on.  We  use  R%(f)  to  denote  the  minimum  cost  of  a  randomized 
protocol  for  /  that  correctly  computes  /  whenever  f(x,y)  ^  z  and  computes  /  with 
probability  at  least  1  —  5  whenever  f{x,y)  =  z.  We  define  Rz(f)  :=  i?{,3(/). 

A  protocol  is  one-way  if  the  communication  consists  of  a  single  message  from  Alice  to 
Bob,  who  then  outputs  an  answer.  We  use  R.J*  (/)  to  denote  the  minimum  communication 
cost  of  a  randomized,  5-error,  one-way  protocol  for  /.  Finally,  we  use  Rs¥,z(f)  to  denote 
the  minimum  communication  cost  of  randomized  one-way  protocols  for  /  with  with  one¬ 
sided  error  5,  and  we  define  R^,z(f)  :=  R^’z (/). 
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11.1.2  Set  Disjointness 

In  the  Set  Disjointness  communication  problem,  Alice  and  Bob  are  given  n-bit  strings  x 
and  y  respectively  and  wish  to  compute 

n 

DlSJn(x,  y)  :=  \J  Xi  A  y{ . 

i= 1 

Equivalently,  Alice  and  Bob’s  inputs  can  be  described  as  sets  A,  B  C  [n] .  In  this  case, 
disj(A,  B)  =  1  if  and  only  if  their  sets  intersect. 

When  n  is  clear  from  context,  we  drop  the  subscript.  A  celebrated  result  of  Kalyana- 
sundaram  and  Schnitger,  later  simplified  by  Razborov,  showed  that  /?(L)lSJn)  =  O(n), 
even  under  the  promise  that  A  and  B  intersect  in  at  most  one  element. 

Theorem  11.4  ([70,  86]).  i?(DlSJn)  =  Q(n). 

We  will  also  use  a  balanced  version  of  disjointness  called  /c-BAL-DlSJ.  In  this  version, 
Alice  receives  a  set  A  C  [n]  of  size  \A\  =  [k/2\  +  1,  Bob  receives  a  set  B  C  [n]  of  size 
[/c/2]  +  1,  and  there  is  a  promise  that  \  A  D  B\  <  1. 

Lemma  11.5.  For  all  0  <  k  <  n  —  2,  we  have  R{k- bal-disj)  =  0  (min{/c,  n  —  k}). 

Proof.  If  n  —  k  =  0(1),  there  is  nothing  to  prove.  Otherwise,  let  m  :=  min{|_/c/2j  + 
1,  n  —  k  —  2}.  We  reduce  from  DlSJm.  Partition  the  elements  of  [n]  \  [m]  into  sets  /  :  = 
{m  +  1, . . .  ,m+l+  |_/c/2_|}  and  J  :=  {m  +  2+  \_k/ 2J, . . . ,  n}.  Note  that  \I\  =  [k/2\  +1. 
Furthermore,  we  have  \J\  >  [/c/2]  +1,  since 

\J\  =  n  —  (m  +  2  +  (_/c/2j )  +  1 
=  n  —  1  —  m  —  [k/ 2\ 

=  n  —  1  —  m  +  [/c/2]  —  k 
=  [/c/2]  +  1  +  n  —  2  —  m  —  k 

>  rfc/21  +  i* 

where  the  penultimate  equality  holds  because  k  =  [k/2\  +  [/c/2],  and  the  inequality 
comes  from  the  fact  that  m  <  n  —  k  —  2. 

Let  A'  and  B'  be  the  sets  received  by  Alice  and  Bob  respectively  as  inputs  to  DlSJm. 
Alice  pads  her  input  with  elements  from  /  until  she  gets  a  set  of  size  [k/2\  +  1.  Bob 
similarly  pads  his  input  with  elements  from  J.  Let  a  :=  [/c/2j  +  1  —  \A!\  and  b  :  = 
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\k/2]  +  1  —  \B'\.  Specifically,  Alice  sets  A  =  A'  U  {m  +  1 ,m  +  a}  and  Bob  sets 
B  —  B'  U  {n,  n  —  1, , . . ,  n  —  b  +  1}. 

Note  that  \A\  =  [k/ 2J  +  1,  \B\  =  \k/2\  +  1,  and  A  D  B  —  A'  fl  B' .  Therefore,  a 
solution  to  /c-bal-disj(A,  B)  gives  a  solution  to  DlSJm(A',  B'),  hence 

f?(/c-BAL-DlSj)  >  i?(D!SJm)  =  O(m)  =  f2(min{£;,n  —  k})  .  □ 


11.1.3  Gap  Equality 

In  the  Gap  Equality  problem,  Alice  and  Bob  are  given  n-bit  strings  x  and  y  respectively 
and  wish  to  compute 


{1  if  x  =  y  , 

0  if  dist(x,  y)  =  t , 

*  otherwise. 

We  drop  the  subscripts  when  n  is  clear  from  context  and  t  =  n/8.  When  GtQ(;r,  y)  =  *, 
we  allow  the  protocol  to  output  0  or  1.  We  are  interested  in  f?~(GEQ);  recall  that  /AfGHQ) 
is  the  minimum  communication  cost  of  a  protocol  for  GEQ  that  only  makes  mistakes  when 
GEQ (x,y)  =  z.  The  standard  public-coin  EQUALITY  protocol  gives  R°( GEQ)  =  0(1). 
For  protocols  that  only  err  when  GEQ(a;,  y)  =  1,  the  complexity  is  drastically  different. 

Buhrman,  Cleve,  and  Wigderson  [35]  proved  an  Q(n)  lower  bound  on  the  determinis¬ 
tic  communication  complexity  of  GEQn  n/2;  their  result  extends  to  other  gap  sizes  and  to 
randomized  protocols  with  one-sided  error. 

Lemma  11.6  ([35]).  i?1(GEQnit)  =  Q,[n)for  all  even  t  =  ©(n).1 


11.2  Main  Reduction  Lemma 


Below  we  define  a  class  of  property  testing  communication  games  and  show  how  commu¬ 
nication  lower  bounds  for  these  games  yield  query  complexity  lower  bounds  for  property 
testers.  Our  communication  games  are  based  on  what  we  call  combining  operators. 

'Curiously,  the  parity  of  t  turns  out  to  be  necessary.  Since  dist(x,  y)  =  |x|  +  \y\  —  2\x  A  y |,  Alice  and 
Bob  can  deterministically  distinguish  x  =  y  from  dist(x,  y)  being  odd  with  a  single  bit  of  communication — 
Alice  sends  Bob  the  parity  of  M.  and  Bob  computes  the  parity  of  \x\  +  \y\.  This  does  not  affect  our  property 
testing  lower  bounds. 
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Definition  11.7  (Combining  operator).  A  combining  operator  is  an  operator  f  that  takes 
as  input  two  functions  /,  g  :  {0, 1}"  — »  Z  and  returns  a  function  h  :  (0,  l}n  — >•  R. 


We  refer  to  the  inputs  /  and  g  as  the  base  functions  of  ip.  By  convention,  we  use  h  to 
refer  to  the  output  of  fi>.  Given  a  combining  operator  ip  and  a  property  V,  we  define  Cfi  to 
be  the  following  property  testing  communication  game.  Alice  receives  /.  Bob  receives  a 
function  g.  They  need  to  compute 

cr(fQ)  h  if  'f(f,g)eV 
v;  [0  if  -0(/,  g)  is  e-far  from  V. 


We  prove  all  of  our  testing  lower  bounds  by  reducing  from  an  associated  communica¬ 
tion  game  C^.  This  reduction  is  simple — Alice  and  Bob  solve  by  emulating  a  P-testing 
algorithm  on  h  :=  u>(f.  g).  Note  that  neither  Alice  nor  Bob  have  enough  information  to 
evaluate  a  query  h(x),  because  h  depends  on  both  /  and  g.  Instead,  they  must  communi¬ 
cate  to  jointly  compute  h{x).  For  this  reduction  to  give  a  strong  query  complexity  lower 
bound  for  the  property  testing  problem,  it  is  essential  that  the  joint  computation  of  h(x) 
occurs  in  a  communication-efficient  manner. 

The  following  definition  gives  a  sufficient  condition  on  combining  operators  that  yield 
strong  reductions  to  testing  problems. 

Definition  11.8  (Simple  combining  operator).  A  combining  operator  i/j  is  simple  if  for  all 
/,  g,  and  for  all  x,  the  query  h{x)  can  be  computed  given  only  x  and  the  queries  f(x)  and 
g(x). 

For  example,  when  the  base  functions  are  boolean,  the  combining  operator  defined  by 
f(f,g)  :=  /  ©  g  is  clearly  simple — each  h(x)  =  f(x)  ©  g(x)  can  trivially  be  computed 
from  f(x)  and  g(x).  On  the  other  hand,  the  combining  operator  i)  that  returns  the  function 
defined  by  h{x)  :=  0  eT  [ f(y )  •  g(y)]  is  not  simple  when  T  is  a  large  set  of  strings  (say  a 
Hamming  ball  centered  at  x),  since  computing  h{x )  requires  knowledge  of  f(y )  and  g(y ) 
for  several  y. 

All  of  the  property  testing  communication  games  we  use  in  this  paper  are  based  on 
simple  combining  operators  and  give  us  a  tight  connection  between  property  testing  and 
communication  complexity  via  the  following  lemma. 

Lemma  11.9  (Main  Reduction  Lemma).  Fix  Z  to  be  a  finite  set.  For  any  simple  combining 
operator  with  base  functions  f,g~.  {0,  l}n  — *  Z  and  any  property  V,  we  have 

(i)  -R(C^)  <  2  Q(V)  ■  flog  \Z\], 
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(ii)  R°(C^)  <  2Q1(V)  ■  [log  \Z\], 

(in)  RA{ CJ)  <  Q™(V)  ■  flog|Z|l,  and 
(iv)  R^{CP)<Q^\V)-\\og\Z\\. 


Proof.  We  begin  by  proving  (iii).  Let  A  be  a  g-query  non-adaptive  tester  for  V .  We 
create  a  one-way  protocol  P  for  Cf:  in  the  following  manner.  Alice  and  Bob  use  public 
randomness  to  generate  queries  x^\  . . .  ,x(q\  Then,  Alice  computes  f(x^), . . . ,  f(x^) 
and  sends  them  to  Bob  in  a  single  (q  ■  [log  |Z|])-bit  message.  For  each  i.  Bob  computes 
g(x^)  and  combines  it  with  /(xW)  to  compute  h(x^).  Finally,  Bob  emulates  A  using  the 
responses  h(x^), . . . ,  h(x^)  and  outputs  1  if  and  only  if  A  accepts  h. 

If  A  has  two-sided  error,  then  by  the  correctness  of  A,  P  computes  Cr.  with  probability 
at  least  2/3.  Hence,  RAiC^)  <  q  •  [log  \Z\) .  In  particular,  if  A  is  an  optimal  non-adaptive 
tester  with  two-sided  error,  then  q  =  Q":'(V).  and  part  (iii)  of  the  lemma  is  proved. 

If  A  has  one-sided  error,  then  whenever  h  e  V,  the  protocol  P  correctly  outputs  1, 
and  when  h  is  e-far  from  V.  the  protocol  correctly  outputs  0  with  probability  at  least  2/3. 
Therefore,  R~^,0( C^)  <  q  ■  [log  \Z\).  In  particular,  when  A  is  an  optimal  non-adaptive 
tester  with  one-sided  error,  R^’°( C^)  <  Q1('P)  ■  [log  \Z\]. 

Now,  suppose  A  is  a  g-query  adaptive  tester  for  V.  Again,  Alice  and  Bob  use  public 
randomness  to  generate  queries  x^\  . . . ,  xiq) .  However,  since  A  is  adaptive,  the  distribu¬ 
tion  of  the  zth  query  depends  on  h  (x(j)  )  for  all  j  <  i.  Instead  of  generating  all  queries 
in  advance,  Alice  and  Bob  generate  queries  one  at  a  time.  Each  time  a  query  xW  is  gener¬ 
ated,  Alice  and  Bob  exchange  f(x^)  and  fj(x(l>).  Since  w  is  a  simple  combining  operator, 
this  is  enough  information  for  Alice  and  Bob  to  individually  compute  h(x^),  which  in 
turn  gives  them  enough  information  to  generate  the  next  query  with  the  appropriate  distri¬ 
bution.  When  h(x^), . . . ,  h(x^)  have  all  been  computed,  Bob  outputs  1  if  and  only  if  A 
accepts  h.  This  protocol  costs  2 q  ■  [log  \Z\]  bits  of  communication,  and  if  A  is  an  optimal 
adaptive  tester,  then  R( Cj)  <  2  Q(V)  ■  [log  \Z\],  Similarly,  if  A  is  an  optimal  adaptive 
tester  with  one-sided  error,  then  R°( C^)  <  2  Ql(V)  ■  [log  \Z\] .  □ 

11.3  Testing  &>Linearity 

In  this  section  we  prove  Theorem  11.1. 

Theorem  11.1  (Restated).  For  any  0  <  e  <  e-testing  k-linearity  requires  L(  nnn{  A\  n  — 
k})  queries. 
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Proof.  We  prove  the  lower  bound  with  a  reduction  from  the  A-BAL-D1SJ  problem.  Recall 
that  C^'lin  is  the  communication  game  where  the  inputs  are  the  functions  /,  g  :  {0,  l}n  — > 
{0, 1}  and  the  players  must  test  whether  the  function  h  —  f  ©  g  is  A; -linear.  Lemmas  11.9 
and  11.5  imply  that 

2Q(A;-linearity)  >  f?(C^"lm)  and  7?(A;-bal-disj)  =  Fl(mm{k,n  —  k})  . 

To  complete  the  proof,  we  show  that  f?(C|'lin)  >  JA(A;-bal-disj)  with  a  reduction  from 
A;-bal-DISJ  to  C^'lin. 

Let  A,B  C  [n]  be  the  two  sets  of  size  \A\  =  [|J  +  1  and  \B\  —  [|]  +  1  received  by 
Alice  and  by  Bob,  respectively,  as  the  input  to  an  instance  of  A;-bal-disj.  Alice  and  Bob 
can  construct  the  functions  Parity^,  Parity  B  :  {0,  l}n  — >•  {0, 1}.  When  \A  fl  B\  =  1,  the 
symmetric  difference  of  the  two  sets  has  size  \AAB\  =  |A|  +  \B\  —  2| A  D  B\  =  k,  and  the 
function  Parity^  ©  Parity  B  =  ParityAAS  is  A;-linear.  Conversely,  when  A  and  B  are  dis¬ 
joint,  the  function  Parity^ ©  Parity B  is  a  (A;+2)-parity  function  and,  by  Proposition  2.24,  it 
is  ^-far  from  A;-linear  functions.  So  Alice  and  Bob  can  solve  their  instance  of  A;-bal-disj 
with  a  communication  protocol  for  C^"1'11.  This  implies  that  f?(C^"lin)  >  A?(  A'-bal-disj), 
as  we  wanted  to  show.  □ 


11.4  Testing  Monotonicity 

Theorem  11.2  (Restated).  Fix  R  CM.  For  any  0  <  e  <  |,  e-testing  the  function  f  : 
(0,  l}n  — >  R  for  monotonicity  requires  f2(min{n,  |i?|2})  queries. 

Proof.  We  prove  the  theorem  in  three  steps.  First,  we  give  an  Q(n)  lower  bound  for  the 
case  when  R  =  Z.  Secondly,  we  handle  the  case  where  \R\  =  yfn  by  a  standard  range 
reduction  argument.  Finally,  we  give  an  fl(|/(|2)  bound  for  small  \R\  by  reducing  from  the 
\R\  =  yfn  case. 

Suppose  R  =  7L.  We  prove  the  lower  bound  for  testing  monotonicity  in  this  case  with 
a  reduction  from  SET-DISJOINTNESS.  Let  ^  be  the  combining  function  that,  given  two 
functions  /,  g  :  (0,  l}n  — >  (0, 1}  and  an  element  x  G  (0,  l}n,  returns  f>(f,  g,x)  =  2  |x|  + 
f(x)+g(x).  Define  C^ONO  be  the  communication  game  where  Alice  and  Bob  are  given  two 
functions  /,  g  :  (0,  l}n  — >  (0, 1}  and  they  must  test  whether  the  function  h  :  (0,  l}n  — >  M 
defined  by  h(x )  =  f>(f,  g ,  x)  is  monotone.  By  Lemma  11.9  and  Theorem  11.4, 

2<5(mono)  >  R{ C“ONO)  and  f?(DiSj)  >  O(n)  . 

We  complete  the  proof  by  showing  that  R( C^ONO)  >  /((disj). 
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Let  A,  B  C  [n]  be  the  subsets  received  by  Alice  and  Bob  as  the  input  to  an  instance 
of  the  SET-DISJOINTNESS  problem.  Alice  and  Bob  can  build  the  functions  xa,  Xb  ■ 
{0,  l}n  — >  {  —  1, 1},  respectively,  by  setting  xa(x)  =  (— I^ga^  and  Xb(x)  =  (-1  )^<es:ri 
Let  h  :  {0,  l}n  — >•  M  be  the  function  defined  by  h(x)  =  ip(xA,  Xb ,  x)  —  2  -  \x\  +  Xa(x)  + 
Xb(x).  We  claim  that  (a)  when  A  and  B  are  disjoint,  h  is  monotone,  and  (b)  when  A  and 
B  are  not  disjoint,  h  is  |-far  from  monotone.  If  this  claim  is  true,  then  we  have  completed 
our  lower  bound  since  it  implies  that  Alice  and  Bob  can  run  a  protocol  for  C“ONO  to  solve 
their  instance  of  set-disjointness  and,  therefore,  R( C“ONO)  >  f?(DiSj). 

We  now  prove  Claim  (a).  Fix  i  G  [n].  For  x  G  {0,  l}n,  let  x0,xi  G  {0,  l}n  be  the 
vectors  obtained  by  fixing  the  zth  coordinate  of  x  to  0  and  to  1,  respectively.  For  any  set 

S  ^  [n],  Xs(x i)  =  (— i)  1he5']  .  ^(^o).  So  when  i  ^  A  and  i  B, 

h(x i)  -  h(x0 )  =  2\xx\-  2  |x0|  =  2  >  0  , 

when  i  G  A  and  i  ^  B, 

h(x i)  -  h(x0 )  =  2  |xi|  -  2  |x0|  -  2xa(^o)  >  0  , 

and  when  i  £  A  and  i  G  B, 

h(x i)  -  h(x0)  =  2  |xi|  -  2  |a?0|  -2  xb(xq)  >  0  • 

Those  three  inequalities  imply  that  when  i  ^  A  fl  B,  the  function  h  is  monotone  on  each 
edge  (xo,  x  i )  in  the  / 1 h  direction.  As  a  result,  when  A  and  B  are  disjoint  the  function  h  is 
monotone. 

Let  us  now  prove  Claim  (b).  Let  A  fl  B  ^  0.  When  %  G  A  fl  B, 

h(xi)  -  h(x0)  =  2  |xi|  -  2  |x0|  -  2xa(^o)  -2xb(xi). 

This  implies  that  for  each  x  where  Xa(x o)  =  Xb(xq)  =  1,  h(x i)  <  h(x0).  Partition 
{0,  l}n  into  2n_1  pairs  that  form  the  endpoints  to  all  the  edges  in  the  ?th  direction.  Exactly 
|  of  these  pairs  will  satisfy  the  condition  xa(x 0)  =  Xb{x0)  =  1,  and  for  each  of  these 
pairs,  either  h(x o)  or  h{x\ )  must  be  modified  to  make  h  monotone.  So  when  A  and  B  are 
not  disjoint,  h  is  |-far  from  monotone. 

To  handle  the  case  where  \R\  =  \/n,  we  sketch  the  proof  of  a  standard  range  reduction 
argument  (see,  e.g.,  [33].)  Specifically,  we  can  assume  without  loss  of  generality  that 
R  =  { —  , . . . ,  ^  }  and  we  modify  the  construction  of  the  function  h  to  create  h' 

when  |x|  -  |  +  1, 

ti(x)  =  l  &  when  \x\  -  f  >  af  -  1, 

[|x|  -  f  +  mM+xsM  when||x|-||  <  ^  -  1. 
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It  is  easy  to  see  that  h!  is  identical  to  h/2,  except  when  |  |x|  —  1 1  >  which  only  occurs 
for  a  constant  fraction  of  x’s.  Using  the  same  reasoning  as  before,  h1  is  monotone  when  A 
and  B  are  disjoint,  and  a  constant  distance  from  monotone  when  A  and  B  intersect.  We 
leave  the  details  to  the  reader. 

Finally,  suppose  that  \R\  =  o(y/n),  and  let  m  :=  \R\2.  We  will  use  a  g-query  testing 
algorithm  for  /  to  create  a  g-query  testing  algorithm  for  functions  g  :  {0,  l}m  — >  {0, 1}. 

Specifically,  given  g,  create  h  :  {0, 1}"  — >  R  by  defining  h(x,y)  :=  g(x)  for  x  G 
{0,  l}m  and  y  G  {0,  l}n~m.  Clearly,  if  g  is  monotone  then  so  is  h.  We  now  want  to  argue 
that  if  g  is  e-far  from  monotone,  then  so  is  h.  We  do  so  by  proving  the  contrapositive. 
Suppose  that  h  is  not  e-far  from  monotone.  Let  h  be  the  monotone  function  closest  to  h; 
thus,  Pr X,y[h(x,y)  f  h(x,y)\  <  e.  By  an  averaging  argument,  there  exists  y  such  that 
Pr X[h(x,y)  h(x,y)\  <  e.  Define  g  :  {0,  l}m  — >•  R  as  g{x)  :=  h(x,y).  It  is  easy  to 
see  that  Prx[g(x)  ^  g(x)]  =  Pr XjV[h(x,y)  ^  h(x,y)\  <  e.  Therefore,  g  is  not  e-far  from 
monotone. 

Our  testing  algorithm  for  g  is  simple:  test  h  and  return  the  result.  By  the  above  claim, 
a  correct  answer  for  testing  h  gives  a  correct  answer  for  testing  g.  Since  testing  g  for 
monotonicity  requires  Q(m)  =  1 1 ( \  R \ 2  j  queries,  the  same  bound  holds  for  testing  h.  □ 


11.5  Decision  trees 

Theorem  11.3  (Restated).  For  any  0  <  e  <  | ,  at  least  f2(s)  queries  are  required  to  e-test 
size-s  decision  trees  with  one-sided  error. 

Proof.  We  first  consider  the  case  where  s  =  2n_1  for  some  n  >  5.  We  prove  this  case 
with  a  reduction  from  the  GAP-EQUALITY  problem  on  s-bit  strings.  Let  C^'DT  be  the 
communication  game  where  Alice  and  Bob  receive  the  functions  /,  g  :  {0,  l}n  — >  {0, 1} 
and  they  must  test  whether  the  function  h  =  f  ©  g  is  computable  by  a  decision  tree  of  size 
s.  By  Lemmas  11.9  and  11.6, 

2Q1(s-DT)  >  R\ CgDT)  and  ^(geq^)  =  U(s)  . 

We  complete  the  proof  by  showing  that  f?1(CgDT)  >  i?1(GEQs  |). 

Let  a,  b  G  {0, 1}S  be  received  by  Alice  and  Bob  as  input  to  an  instance  of  the  GAP- 
EQUALITY  problem.  They  must  determine  if  a  =  b  or  whether  dist(a,  b)  —  |.  Alice  and 
Bob  can  solve  their  instance  of  the  GEQ  problem  with  the  following  protocol.  Let  the  set 
of  vectors  x  G  {0,  l}n  with  even  parity  Parity(x)  =  X\  ©  •  •  •  ©  xn  =  0  define  an  indexing 


124 


of  the  bits  of  a.  (I.e.,  fix  a  bijection  between  those  strings  and  [s].)  Alice  and  Bob  build 
the  functions  /,  g  :  {0,  l}n  — >  {0, 1}  by  setting 


f(x) 


ax  when  Parity(x)  =  0, 
0  when  Parity(x)  =  1, 


and 


g(x) 


bx  when  Parity(x)  =  0, 
1  when  Parity(x)  =  1. 


Alice  and  Bob  then  test  whether  /  ©  g  can  be  represented  with  a  decision  tree  of  size  at 
most  j§2n;  when  it  can,  they  answer  dist(a,  b )  =  |. 

Let  us  verify  the  correctness  of  this  protocol.  For  any  x  G  {0,  l}n  where  Parity  (x)  = 
0,  we  have  that  (/  ©  g){x)  —  ax  ©  bx.  Furthermore,  for  each  x  where  Parity  (x)  =  1,  we 
get  (/  ©  g)(x)  =  1.  So  when  a  =  b,  then  /  ©  g  is  the  Parity  function.  By  Lemma  ??,  this 
function  is  ^-far  from  every  decision  tree  of  size  at  most  f|2n.  When  dist(a,  b)  =  |,  con¬ 
sider  the  (complete)  tree  that  computes  /  ©  g  by  querying  xt  in  every  node  at  level  i.  This 
tree  has  2n  leaves,  but  for  every  input  x  where  ax  ^  bx,  we  have  that  the  corresponding  leaf 
has  the  same  value  as  its  sibling.  So  for  each  such  input,  we  can  eliminate  one  leaf.  There¬ 
fore,  we  can  compute  /  ©  g  with  a  decision  tree  of  size  at  most  2n  —  2n~1 / 8  <  y|2n.  □ 


11.6  Notes  and  Discussion 

Other  results.  As  we  saw  in  Chapter  7,  lower  bounds  on  the  query  complexity  for  testing 
A- linearity — or,  more  precisely,  on  the  number  of  queries  required  to  distinguish  A  - linear 
and  (k  +  2)-linear  functions — yield  lower  bounds  on  the  query  complexity  for  testing  a 
number  of  other  properties  of  boolean  functions. 

Our  proof  of  Theorem  11.1  does  give  a  lower  bound  on  the  number  of  queries  required 
to  distinguish  A;-linear  and  (A;  +  2)-linear  functions.  As  a  result,  all  the  (asymptotic)  bounds 
in  Table  7.1  can  be  obtained  directly  from  Theorem  ILL 

The  lower  bound  on  the  query  complexity  required  to  test  monotonicity  in  Theo¬ 
rem  11.2  also  implies  a  matching  £2(n)  lower  bound  on  the  query  complexity  for  testing 
submodularity.  This  follows  from  a  result  of  Seshadhri  and  Vondrak  [91],  who  showed 
that  testing  submodularity  is  at  least  as  hard  (in  terms  of  query  complexity)  as  testing 
monotonicity.  For  the  details,  see  [23]. 

Following  the  initial  publication  of  the  research  presented  in  this  chapter,  the  commu¬ 
nication  complexity  method  has  been  used  by  Brody,  Matulef,  and  Wu  [34]  to  establish 
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new  lower  bounds  related  to  testing  properties  computable  by  small- width  ordered  binary 
decision  diagrams.  It  has  also  been  used  by  Jha  and  Raskhodnikova  [68]  to  obtain  lower 
bounds  for  testing  Lipschitz  functions,  as  part  of  their  investigation  on  property  testing  and 
data  privacy. 
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Chapter  12 
Active  Testing 


In  this  chapter,  we  introduce  a  new  model  of  property  testing,  which  we  call  the  active  test¬ 
ing  model.  Before  describing  the  model  itself,  we  begin  by  describing  the  model  selection 
problem  in  learning  theory  which  motivated  the  research  presented  in  this  chapter. 


Motivation.  The  central  problem  in  learning  theory  can  be  formulated  as  follows.  An 
algorithm  A  is  given  query  access  to  some  function  /  :  {0,  l}n  — >  {0, 1}  that  is  promised 
to  have  some  given  property  V.1  The  learner  A’s  task  is  to  identify  a  hypothesis  function 
h  :  {0,  l}n  — >■  {0, 1}  that,  with  probability  at  least  1  —  <5,  is  e-close  to  /.  The  goal  is 
to  minimize  the  number  of  queries  to  /  and  the  running  time  required  by  A  to  complete 
this  task.  When  A  is  free  to  query  /  on  any  inputs  of  its  choosing,  the  resulting  learning 
framework  is  called  the  membership  query  model.  If  the  hypothesis  h  is  guaranteed  to  also 
have  property  V .  the  algorithm  A  is  called  a  proper  learner. 

Consider  now  a  slight  variation  on  the  basic  learning  problem:  a  learning  algorithm 
A  is  again  given  query  access  to  some  function  /  :  {0,  l}n  — >  {0, 1},  but  the  promise 
is  weakened  to  only  guarantee  that  /  has  at  least  one  of  the  properties  V\, ... ,  Vm.  As 
before,  A  is  required  to  find  a  hypothesis  h  :  {0,  l}n  — >■  {0, 1}  that  is  e-close  to  /  with 
probability  at  least  1  —  5.  How  many  queries  does  A  require  for  this  task? 

One  way  A  can  learn  a  good  hypothesis  h  for  /  is  to  learn  m  hypotheses  hi, ,  hm, 
one  for  each  of  the  possible  assumptions  /  G  V\ ,...,/  G  Vm.  It  can  then  test  these 
hypotheses  to  identify  the  good  hypothesis.  This  approach  is  sound,  but  it  is  not  query- 
efficient.  If  qi, . . . ,  qm  are  the  minimum  number  of  queries  required  to  learn  a  good  hy¬ 
pothesis  for  a  function  /  in  Pi, ... ,  Vm,  respectively,  then  this  approach  requires  at  least 

'in  learning  theory,  a  property  of  boolean  functions  is  typically  called  a  class  of  functions. 
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qi  +  ■  ■  •  +  qrn  queries. 

A  more  query-efficient  solution  to  the  problem  is  to  perform  model  selection :  instead 
of  learning  one  hypothesis  per  property,  we  first  identify  a  property  Vi  of  the  function  /, 
then  we  learn  a  good  hypothesis  for  /  under  the  promise  that  /  £  V,. 

For  the  model  selection  approach  to  be  query-efficient,  we  need  a  good  way  to  tell 
if  /  has  properties  Pi, ,  Pm.  This  is  where  property  testing  comes  in  handy:  by  test¬ 
ing  whether  /  is  in  Pi, ... ,  Pm,  we  can  quickly  reject  all  the  properties  that  do  not  con¬ 
tain  any  hypothesis  e-close  to  /  while  accepting  the  property  (or  properties)  that  contains 
/.  Furthermore,  very  efficient  testers  have  been  designed  for  many  properties  commonly 
studied  in  machine  learning:  linear  threshold  functions  [77],  unions  of  intervals  [74],  jun¬ 
tas  [52,  20],  DNFs  [46],  and  decision  trees  [46]. 


Active  learning.  While  the  application  of  property  testing  to  model  selection  as  de¬ 
scribed  above  is  interesting,  its  potential  utility  is  limited  by  the  fact  that  the  assumptions 
in  the  membership  query  model  are  unrealistic  in  most  machine  learning  applications. 

For  example,  consider  the  problem  of  classifying  documents  by  topic.  A  document  can 
be  represented  as  a  boolean  vector  x  €  {0, 1}”  by  letting  each  xt  denote  the  presence  (or 
absence)  of  a  given  word  in  the  document.  A  boolean  classifier — for  illustration,  let’s  con¬ 
sider  “sports  articles”  vs.  “non-sports  articles” — can  be  described  as  a  boolean  function 
/  :  {0,  l}n  — >  (0, 1}.  Let  A  be  a  learning  algorithm  that  is  tasked  with  constructing  a  clas¬ 
sifier  h  :  {0, 1}"  — >■  {0, 1}  that  is  close  to  /.  To  execute  A,  we  must  give  it  query  access  to 
/.  When  we  have  an  existing  article  (say,  taken  from  the  web)  whose  corresponding  word 
vector  is  x  €  {0,  l}n,  we  can  let  A  query  the  value  of  f(x)  by  having  a  user  determine  if 
the  article  is  about  sports  or  not.  This  approach,  however,  does  not  work  if  we  generate  an 
arbitrary  input  x  €  (0,  l}n  and  then  try  to  get  users  to  determine  the  value  of  fix). 

As  a  result,  the  dominant  query  paradigm  in  machine  learning  is  not  the  membership 
query  model,  but  the  active  learning  model.  In  this  model,  the  algorithm  can  still  choose 
the  inputs  on  which  it  wants  to  query  the  function  /,  but  it  can  only  choose  inputs  among 
those  that  exist  in  nature.  We  describe  this  model  more  precisely  in  Section  12.1. 


Results.  In  this  chapter,  we  bring  the  active  model  in  learning  to  the  domain  of  testing. 
The  main  contribution  of  the  research  presented  in  this  chapter  is  the  definition  of  the 
model  itself,  which  we  present  in  Section  12.1. 

The  active  testing  model  is  a  restricted  form  of  property  testing,  so  it  is  natural  to  expect 
that  some  properties  which  are  efficiently  testable  in  the  standard  property  testing  model 
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are  not  as  efficiently  testable  in  the  active  testing  model.  Our  first  (negative)  result  confirms 
this  suspicion.  Recall  that  dictator  functions  can  be  tested  with  a  constant  number  of 
queries  in  the  standard  property  testing  model.  In  the  active  testing  model,  however,  testing 
dictator  functions  requires  as  many  queries  as  are  needed  to  learn  dictator  functions. 

Theorem  12.1.  Active  testing  of  dictatorships  under  the  uniform  distribution  requires 
fl(logn)  queries. 

The  proof  of  the  theorem  says  something  even  stronger:  O(logn)  queries  are  required 
in  the  active  testing  model  to  distinguish  dictator  functions  from  random  functions.  As 
a  result,  the  same  lower  bound  applies  to  the  query  complexity  of  many  other  property 
testing  tasks:  testing  computability  by  small  decision  trees,  testing  juntas,  testing  linear 
threshold  functions,  etc.  Despite  this  note  of  caution,  it  is  still  possible  that  many  of  those 
properties  can  still  be  tested  much  more  efficiently  than  they  can  be  learned. 

The  second,  and  principal,  result  of  this  section  confirms  that  there  are  indeed  funda¬ 
mental  properties  of  boolean  functions  that  can  be  tested  more  efficiently  than  they  can 
be  learned  in  the  active  testing  model.  Specifically,  we  show  that  testing  linear  threshold 
functions  is  much  easier  than  learning  the  same  class  of  functions. 

Theorem  12.2.  There  is  an  active  tester  for  linear  threshold  functions  over  the  standard 
n-dimensional  Gaussian  distribution  that  makes  Oi  f  n  log  //)  queries.  Furthermore,  any 
such  active  tester  for  linear  threshold  functions  makes  at  least  (n1/3)  queries. 

There  are  other  properties  of  functions  that  can  be  tested  efficiently  in  the  active  model. 
We  discuss  these  results  and  the  relation  of  active  testing  to  other  models  of  property 
testing  in  Section  12.5. 


12.1  The  active  testing  model 

Active  learning,  in  general,  describes  models  of  machine  learning  in  which  the  learning 
algorithm  has  some  freedom  in  choosing  what  inputs  on  which  to  query  the  target  func¬ 
tion.  The  name  active  is  chosen  to  contrast  with  the  passive  learning  model,  in  which 
the  learning  algorithm  observes  the  value  of  the  target  function  on  some  inputs  that  are 
drawn  at  random  from  some  distribution.  The  “most  active”  model  of  learning,  in  which 
the  learning  algorithm  has  complete  freedom  over  the  set  of  queries  it  makes  to  the  target 
function,  is  called  the  membership  query  learning  model. 
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The  model  of  active  learning  that  we  consider  in  this  chapter  lies  somewhere  in  be¬ 
tween  the  passive  and  membership  query  learning  models.  Specifically,  we  consider  the 
following  class  of  learning  algorithms. 

Definition  12.3  (Sample-restricted  learner).  The  randomized  algorithm  A  with  oracle  ac¬ 
cess  to  a  function  /  :  (0, 1}"  — >  {0, 1}  in  V  is  an  s-sample,  q-query  (. e,5)-leamer  for 
the  property  V  over  the  distribution  V  if  it  draws  a  set  S'  of  | S' |  =  s  samples  from  V. 
queries  the  value  of  /  on  q  elements  from  S,  and  with  probability  at  least  1  —  5  outputs  a 
hypothesis  h  :  (0,  l}n  — *  (0, 1}  that  satisfies  dist £>(/,  h)  <  e. 

Our  concept  of  active  learner  corresponds  to  a  sample-restricted  learner  that  has  access 
to  a  polynomial  number  of  (unlabeled)  samples. 

Definition  12.4  (Active  learner).  The  algorithm  A  is  a  q-query  active  (e,  5) -learner  for  V 
over  the  distribution  V  if  it  is  a  poly  (n) -sample,  g-query  (e,  5)-learner  for  V  over  D. 

We  are  now  ready  to  introduce  the  main  definitions  for  the  active  testing  model.  The 
goal  of  our  definitions  is  to  mirror  the  active  learning  definitions.  We  do  so  as  follows. 

Definition  12.5  (Sample-restricted  tester).  The  randomized  algorithm  A  with  oracle  ac¬ 
cess  to  a  function  /  :  (0,  l}n  — >  {0, 1}  is  an  s-sample,  q-query  e-tester  for  V  over  the 
distribution  D  if  it  draws  s  samples  from  D.  queries  the  value  of  /  on  q  of  those  samples, 
and 

1.  Accepts  with  probability  at  least  |  when  /  G  P;  and 

2.  Rejects  with  probability  at  least  |  when  dist £>(/,  V)  >  e. 

Note  that  our  definition  of  sample-restricted  tester  coincides  with  the  standard  defini¬ 
tion  of  a  tester  when  the  number  of  samples  is  unlimited  and  the  support  of  V  is  (0,  l}n. 

When  we  fix  q  =  s,  then  the  definition  of  sample-restricted  tester  corresponds  to  the 
passive  testing  model  first  studied  in  [59]  in  which  the  queries  made  by  the  algorithm  are 
completely  determined  by  the  random  draws  to  the  distribution  D. 

Our  concept  of  active  tester  corresponds  to  a  sample-restricted  tester  that  has  access  to 
a  polynomial  number  of  (unlabeled)  samples. 

Definition  12.6  (Active  tester).  A  randomized  algorithm  is  a  q-query  active  e-tester  for 
V  C  (0,  l}n  — *  (0, 1}  over  D  if  it  is  a  poly  (n) -sample,  g-query  e-tester  for  V  over  V. 
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12.2  Testing  dictator  functions 


We  prove  the  lower  bound  for  te  sting  dictator  functions  with  a  variant  of  Lemma  3.15.  We 
start  by  stating  and  proving  this  basic  lemma. 

Lemma  12.7.  Fix  a  property  V  C  {0,  l}n  — *  {0, 1}.  Let  Vyes  be  a  distribution  over  V 
and  let  Vno  be  a  distribution  such  that  f  ~  Duo  is  e-farfrom  V  with  probability  1  —  o(l). 
Let  S  C  {0,l}n  be  obtained  by  drawing  s  samples  independently  at  random  from  the 
distribution  V  over  {0,  l}n.  Suppose  that  with  probability  at  least  |  (over  the  choice  of 
S),  every  subset  X  C  S  of  size  |A"|  =  q  and  every  r  G  {0,  l}9  satisfy 

Pr  [/(X)  =  r]  <  |  Pr  [ f(X )  =  r}.  (12.1) 

J  ~  Ldyes  J~^no 

Then  there  is  no  s-sample  q-query  e-tester  for  V. 

Proof  Let  A  be  a  deterministic  algorithm  that,  for  each  possible  subset  S  C 
selects  a  set  A"  C  S  of  size  \X\  =  q  and  queries  /  :  {0,  l}n  — >  {0, 1}  on  each 
X.  Suppose  that  A  is  guaranteed  to  accept  /  ~  Vyes  with  probability  at  least  | 
contains  s  elements  drawn  independently  from  V.2 

When  every  subset  X  C  S  of  size  |X|  =  q  and  every  r  e  {0,  l}9  satisfy  (12.1),  let  us 
call  S  bad.  By  our  assumption  on  the  acceptance  probability  of  A, 


{0,  l}n, 

input  in 
when  S 


E  Pr  [A  accepts  /]  |  S  is  bad 

s  f~Vyes 


3 

5' 


But  conditioned  on  S  being  bad,  the  probability  that  A  accepts  a  function  drawn  from  Vno 
cannot  be  much  smaller  than  that  of  accepting  a  function  drawn  from  Vyes  so  we  have 


E  Pr  [A  accepts  /]  |  S  is  bad 

S  f~D  no 


1 

2  ' 


This  means  that  A  accepts  a  function  drawn  from  Vno  with  probability  at  least 


Pr[^4  accepts  /]  >  PrfS1  is  bad]  •  E  Pr  [^4  accepts  /]  |  S  is  bad]  =  |  •  \  —  ^  >  |- 


Therefore,  no  deterministic  s-sample  g-query  algorithm  can  distinguish  functions  drawn 
from  T>yes  or  from  Vno  with  probability  at  least  |,  and  the  lemma  follows  directly  from 
Yao’s  Minimax  Principle  [100].  □ 

2Here,  the  probability  that  A  accepts  is  over  the  random  choice  of  both  /  and  S. 
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We  are  now  ready  to  complete  the  proof  of  Theorem  12.8. 

Theorem  12.8.  Active  testing  of  dictatorships  under  the  uniform  distribution  requires 
12  (log  n)  queries. 

Proof.  Let  Vyes  and  Vuo  denote  the  uniform  distributions  over  dictator  functions  and  all 
boolean  functions  {0,  l}n  — »  {0, 1},  respectively.  Fix  X  to  be  a  set  of  q  queries  from 
{0,  l}n.  We  can  write  X  as  a  q  x  n  matrix.  For  each  r  G  {0,  l}9,  write  cr(X)  to  denote 
the  number  of  columns  of  X  that  are  identical  to  r.  For  every  r,  we  have 

Pr  [f(X)=  r]-  Pr  [/( X)  =  r]  =  _  2-«  . 

When  X  is  formed  by  drawing  q  queries  independently  and  uniformly  at  random  from 
{0,  l}n,  we  can  equivalently  say  that  X  is  formed  by  drawing  n  columns  independently 
and  uniformly  at  random  from  {0,  l}9.  Thus,  for  any  r  G  {0,  l}q  we  have 

E[cr(X)]  =  2  ~qn 

and  by  Chernoff ’s  bound 

pr  [  _  2~q  >  —2~q\  <  e~cn2~q 

x  [  n  36 

for  some  appropriate  constant  c.  By  applying  the  union  bound  over  all  2q  column  types  in 
{0,  l}n  and  over  the  (poI^n))  <  nc'q log n  ways  to  choose  a  set  X  of  q  queries  from  the  set 
S  of  poly(n)  queries  sampled  independently  from  the  uniform  distribution,  we  get  that 

Pr  3X  C  S',  r  G  {0,  l}q  :  -  2q  >  —  2~q  <  ec'qlog n+q-cn2~^ 

s  l  n  36 

When  q  >  c"  log  n  for  some  appropriately  large  constant  c",  this  probability  is  less  than  | 
and  the  lower  bound  follows  from  Lemma  12.7.  □ 

12.3  Upper  bound  for  Testing  LTFs 

We  now  turn  to  the  problem  of  testing  linear  threshold  functions  (LTFs)  in  the  active 
property  testing  model.  For  this  section  and  the  next  one,  we  leave  the  setting  of  boolean 
functions  to  explore  a  slightly  more  general  class  of  functions:  functions  of  the  form 
/  :  Mn  — >•  {0, 1}.  The  definition  of  linear  threshold  functions,  or  halfspaces ,  in  this  setting 
is  as  follows. 
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Definition  12.9.  The  function  /  :  Mn  — »  {0, 1}  is  a  linear  threshold  function  if  there  exist 
n  +  1  parameters  w\, . . . ,  wn,  tel  such  that  for  every  x  G  Mn, 

/(x)  =  sgn(wi£i  H - h  wnxn  -  t), 

where  the  function  sgn  :  M  — >•  {0, 1}  returns  sgn(z)  =  1  iff  ^  >  0. 


We  will  study  the  problem  of  testing  linear  threshold  functions  over  the  standard  n- 
dimensional  Gaussian  distribution  J\fn( 0, 1)  over  M".  There  is  a  close  connection  to  this 
problem  and  the  Hermite  decomposition  of  functions  that  we  introduced  in  Section  4.3.2. 
This  connection  is  described  in  the  following  key  lemma  of  Matulef,  O’Donnell,  Rubin- 
feld,  and  Servedio  [77]. 

Lemma  12.10  (Matulef  et  al.  [77]).  There  is  an  explicit  continuous  function  W  :  M  — >  M 
with  bounded  derivative  UW'Uoo  <  1  and  peak  value  W { 0)  =  |  such  that  every  linear 
threshold  function  f  :  M”  — >  {  —  1, 1}  satisfies 


Y,fM2  =  W(Ef). 

i—  1 

Moreover,  every  function  g  :  M”  — >  {  —  1, 1}  that  satisfies 


W  (Eg) 

X 


<  4e3 


is  e-close  to  being  a  linear  threshold  function. 


Lemma  12.10  shows  that  i  f(e,f2,  the  sum  of  the  level-one  Hermite  coefficients  of  a 
function,  provides  a  characterization  of  linear  threshold  functions.  We  can  thus  test  linear 
threshold  functions  by  estimating  the  value  of  this  expression  for  a  given  function.  We 
will  do  so  by  using  the  characterization  of  the  level- 1  Hermite  weight  of  a  function  that 
we  established  in  Section  4.3.2. 


12.3.1  Algorithm 

Lemmas  12.10  and  4.18  suggest  that  linear  threshold  functions  may  be  tested  with  the 
following  simple  LTFTester  algorithm. 

This  algorithm  queries  the  value  of  /  on  all  the  samples  in  S,  so  it  makes  a  total  of 
m  queries.  The  value  fi  is  an  unbiased  estimate  of  E  /  and,  since  the  items  of  S  are 
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LTFTester (/,  e) 

1 .  Draw  a  set  S  C  M”  of  m  samples  indep.  from  Mn{ 0, 1). 

2.  Set  /2  =  Exes[/(x)]. 

3.  Set  z>  =  Ex?4wes[/(a;)/(j/)  (a;,?/)]. 

4.  Accept  iff  | z>  —  W{fi)  \  <  2e3. 


drawn  independently,  we  can  use  standard  Chernoff  bound  arguments  to  show  that  m  = 
0(y/n)  samples  are  sufficient  to  guarantee  that  this  estimate  has  very  small  error  with  high 
probability. 

The  value  z>,  by  Lemma  4.18,  is  an  unbiased  estimate  of  Y^i= i  /(e*)2-  The  values  of 
f(x)f(y)  ( x ,  y)  are  no  longer  all  independent,  so  we  cannot  use  standard  Chernoff  bounds 
directly  to  bound  the  error  of  z>.  But  the  value  z>  is  a  U-statistic  of  order  two,  so  we  can 
use  the  concentration  of  measure  tools  from  Section  4. 1 .3  to  bound  this  error.  In  order  to 
do  so,  however,  it  is  convenient  to  modify  the  definition  of  z>  slightly  so  that  it  is  bounded. 
The  resulting  algorithm  is  described  in  Figure  12.1. 


LTFTester .*(/,  e) 

Parameters:  r  =  \J  An  log(4n/e3),  m  =  800r/e3  +  32/e6. 

1 .  Draw  a  set  S  C  M”  of  m  samples  indep.  from  Mn{ 0,  /). 

2.  Set/2  =  ExeS[/(ic)]. 

3.  SetP  =  Ex^yeS[f(x)f(y)  (x,y)  ■  l[|(x,j/)|  <  t]\. 

4.  Accept  iff  | F  —  W(jl)  \  <  2e3. 


Figure  12.1:  Algorithm  for  testing  linear  threshold  functions  in  the  active  testing  model. 


12.3.2  Analysis  of  the  algorithm 

To  verify  the  correctness  of  the  LTFTester*  algorithm,  we  must  establish  two  facts:  that 
/2  and  z>  are  close  enough  to  the  value  they  estimate,  and  that  E  v  is  close  enough  to  the 
true  value  E  f(x)f(y)  ( x ,  y)  that  we  want  to  compute.  We  begin  with  this  latter  task. 

For  a  function  /  :  M"  — >  M,  define^/  :  I"xM"  — >  Mtob  e^f(x,y)  =  f(x)f(y)  ( x,y ). 
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Let  ipf  :  Mn  x  Mn  — >  M  be  the  truncation  of  ip f  defined  by  setting 


f(x)f(y)  (x,y)  if  |  (x,  y)  \  <  ^4nlog(4n/e3) 

0  otherwise. 


The  next  lemma  shows  that  E  ipj  and  E  ipf  arc  close. 

Lemma  12.11.  For  any  function  f  :  Mn  — >■  M,  |  E  ipf  —  E^|  <  ^e3. 


Proof.  For  notational  clarity,  fix  r  =  y/i n  log(4n/e3).  By  the  definition  of  ipf  and  ipf  and 
with  the  trivial  bound  \f(x)f(y)  (x,  y)  \  <  n  we  have 


E  vj  —  E  iff  - 


Pr  [  \  {x,y)\  >  t]  ■  E  f(x)f(y)  (x,y)  \  \{x,y}\  >  r 


x,y 


x,y  L 


<  n  ■  Pr  \(x,y}\  >  r  . 


x,y 


The  right-most  term  can  be  bounded  with  a  standard  Chernoff  argument.  By  Markov’s 
inequality  and  the  independence  of  the  variables  xi, . . . ,  xn.  y, , . . . ,  yn. 


Pr  (x,  y)  >  t  =  Pr  >  etT  < 


Ee*<x>»>  nr=iE' 


>txiyi 


x,y 


-  etr 


otr 


The  moment  generating  function  of  a  standard  normal  random  variable  is  E  ety  =  e*2/2,  so 

E  [etei2/i]  =  E  [EetXiVi]  =  Ee(f2/2)l-. 


Xi  lUi 


Xi  Ui 


When  x  ~  A/”(0, 1),  the  random  variable  x2  has  a  ;\/2  distribution  with  1  degree  of  freedom. 
The  moment  generating  function  of  this  variable  is  E  etx2  =  \J  =  ^/l  +  7^77  f°r  anY 
t  <  |.  Hence, 

Ee(P/2K2  <  / 1  +  _J?_  <  T=P) 

xi  -  V  1  -  t2  ~ 

for  any  t  <  1.  Combining  the  above  results  and  setting  t  —  yields 

Pr  [  (x,y)  >  rl  <  e2^-*2)  <  e_3™  = 

x,y  1  J  4n 

The  same  argument  shows  that  Pr[(x,  y)  <  — r]  <  as  well.  □ 
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We  can  now  complete  the  analysis  of  the  LTFTester*  algorithm  and  the  proof  of  the 
upper  bound  in  Theorem  12.2. 

Theorem  12.12  (Upper  bound  in  Theorem  12.2,  restated).  There  is  an  active  tester  for  lin¬ 
ear  threshold  functions  over  the  standard  n-dimensional  Gaussian  distribution  that  makes 
0(\/n  log  n)  queries. 

Proof  Consider  the  LTFTester*  algorithm.  When  the  estimates  fl  and  D  satisfy 

|/f-E./|<e3  and  \v  -  E[f(x)f(y)  (x,  y)}  \  <  e3, 

Lemmas  12.10  and  4.18  guarantee  that  the  algorithm  correctly  distinguishes  LTFs  from 
functions  that  are  far  from  LTFs.  To  complete  the  proof,  we  must  therefore  show  that  the 
estimates  are  within  the  specified  error  bounds  with  probability  at  least  2/3. 

The  values  fix'), . . . ,  f(xrn)  are  independent  {  —  1,  l}-valued  random  variables.  By 
Hoeffding’s  inequality, 

Pr[|/i  -  E  f\  <  e3]  >  1  -  2e~e6m/2  =  1  -  2e~°{Vri) . 

The  estimate  v  is  a  U-statistic  with  kernel  u>j.  This  kernel  satisfies 

Wf  “  E^lloo  <  2||^||oo  =  2 y/ An  log(4n/ e3). 

Let  E2  =  Ex[Ey[f*f(x,y)]2]  -  EXty[ip*f(x,  y)}2.  Then 

S2  <  E  [E [rf{x,y))2]  =  E  [E [f(x)f(y)  (x,y)  l[|(a;,j/)|  <  r]]2]. 

y  L  x  J  y  x 

Applying  the  Cauchy-Schwartz  inequality  to  the  expression  for  E2  gives 

E2<E[E [f(x)f(y)  (x,y)}2] 
y  x 

n 

=  E[(£/MyiE[/(*M)*] 

V  i=  1 

n 

=  X  /(e0/(ei)  Ete]  =  X]  Kg)2- 

1,3  1=1 

By  Parseval’s  identity,  we  have  Y!i  f(ei)2  <  ||/||2  =  ll/lll  =  1-  We  can  now  apply 
Arcones’  Theorem  (Theorem  4.7)  and  Lemma  12.11  to  obtain 

_ ml2 _ 

Pr[|z>  —  E-0/1  <  e3]  =  Pr[|z>  —  E^|  <  |e3]  >  1  —  4e  >  il. 

The  union  bound  completes  the  proof  of  correctness.  □ 
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12.4  Lower  Bound  for  Testing  LTFs 

The  last  section  showed  that  it  is  possible  to  test  linear  threshold  functions  with  0(y/n) 
queries  in  the  active  testing  model.  The  lower  bound  in  Section  12.2  on  the  number 
of  queries  needed  to  distinguish  dictator  functions  from  random  functions  implies  that 
f2(log  n)  queries  are  required  to  test  linear  threshold  functions  in  the  active  model. 

In  this  section,  we  prove  a  better  lower  bound  showing  that  the  number  of  queries 
required  to  test  linear  threshold  functions  in  the  active  model  must  be  polynomial  in  n.  To 
establish  this  lower  bound,  we  begin  by  extending  Lemma  12.7  to  real-valued  functions. 

Lemma  12.13.  Fix  a  property  ?  C  1"  4  K.  Let  Vyes  be  a  distribution  over  V  and  let 
Vno  be  a  distribution  such  that  f  ~  Vno  is  e-farfrom  V  with  probability  1  —  o(l).  For  any 
set  X  C  M"  of  size  A"  =  q,  let  px, yes,  Px,  no  :  M9  — >•  M  be  the  probability  density  functions 
for  the  random  variable  /( X)  when  f  ~  T>yes  and  f  ~  Vno,  respectively.  Let  S  C  Mn 
be  a  set  of  s  samples  drawn  independently  from  the  n-dimensional  Gaussian  distribution 
J\fn(0,I).  Suppose  that  with  probability  at  least  |  ( over  the  choice  of  S),  every  subset 
X  C  S  of  size  |  A"  |  =  q  satisfy 

amo  n  [^^(r)  <  fPAno(r)]  >  |. 

r~fs/q(0J) 

Then  there  is  no  s-sample  q-query  e-tester  for  V. 

The  proof  of  Lemma  12.13  is  essentially  identical  to  that  of  Lemma  12.7.  We  refer  the 
reader  to  [13]  for  the  details. 

We  now  complete  the  proof  of  the  lower  bound  in  Theorem  12.2. 

Theorem  12.14  (Upper  bound  of  Theorem  12.2  restated).  At  least  ^(n1/3)  queries  are 
required  to  test  linear  threshold  functions  over  the  standard  n-dimensional  Gaussian  dis¬ 
tribution  in  the  active  testing  model. 

Proof.  We  begin  by  introducing  two  distributions  Vyes  and  Vno  over  linear  threshold  func¬ 
tions  and  functions  that  (with  high  probability)  are  far  from  linear  threshold  functions,  re¬ 
spectively.  We  draw  a  function  /  from  Vyes  by  first  drawing  a  vector  w  ~  J\f( 0,  Inxn)  from 
the  //-dimensional  standard  normal  distribution.  We  then  define  f  :  x  (->•  sgn(-^x  •  w). 
To  draw  a  function  g  from  Vno,  we  define  g(x)  =  sgn (//,.)  where  each  yx  variable  is  drawn 
independently  from  the  standard  normal  distribution  J\f( 0, 1). 

Let  X  G  WlXq  be  a  random  matrix  obtained  by  drawing  q  vectors  from  the  n-dimensional 
normal  distribution  J\f( 0,  7nxn)  and  setting  these  vectors  to  be  the  columns  of  X.  Equiv¬ 
alently,  X  is  the  random  matrix  whose  entries  are  independent  standard  normal  variables. 
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When  we  view  X  as  a  set  of  q  queries  to  a  function  /  ~  Vyes  or  a  function  g  ~  Vno,  we 
get  f(X)  =  sgn(^Xw)  and  g(X)  =  sgn (yx).  Note  that  ^Xw  ~  J\f( 0,  ±X*X)  and 
yx  ~  J\f( 0,  /,/X9j-  To  apply  Lemma  12.13  it  suffices  to  show  that  the  ratio  of  the  pdfs  for 
both  these  random  variables  is  bounded  by  |  for  all  but  1  of  the  probability  mass. 

The  pdf  p  :  W1  — >  M  of  a  ^-dimensional  random  vector  from  the  distribution  J\fqxq(0,  X) 
is 

p(x)  =  (2tt)-2  det(S)-5e-^Ts'1L 

Therefore,  the  ratio  function  r  :  W  — >  R.  between  the  pdfs  of  4^  A" w  and  of  yx  is 

r(x)  =  det{\X*X)-^e^xT^x*xrl-I)x. 


Note  that 


xT{{±X*X)-1  -  I)x  <  Wi^X'X)-1  -  I\\\\x\\l  =  \\±X'X  -  I\\\\x\\l 
so  by  Lemma  4.8  with  probability  at  least  1  —  2e_*2/2  we  have 


r(x)  <  e 


q  I  (y/q+t)  i  o  y/q+t  \  i  q  y/q+t 
2 


+2^)+3^INIl 


By  a  union  bound,  for  U  ~  A/”(0,  Inxn)u ,  u  G  N  with  u  >  q,  the  above  inequality  for 
r{x)  is  true  for  all  subsets  of  U  of  size  q,  with  probability  at  least  1  —  uq 2e  t2//2.  Fix 
q  =  n5/(50(ln(-u))5)  and  t  =  2 yjq ln(u).  Then  uq2e~t2^2  <  2 u~q,  which  is  <  1/4  for 
any  sufficiently  large  n.  When  1 1 x \ \ 2  <  3 q  then  for  large  n,  r(x)  <  e74/625  <  |.  To 
complete  the  proof,  it  suffices  to  show  that  when  x  ~  J\f( 0,  Iqxq),  the  probability  that 
\\x\\l  >  is  at  most  \2~q.  The  random  variable  ||a;|||  has  a  y2  distribution  with  q  degrees 
of  freedom  and  expected  value  E  ||:r||2  =  Yli=i  E  xi  =  <?•  Standard  concentration  bounds 
for  x2  variables  imply  that 


Pr 

x~Af(0,IqXq) 


x\\l  >  3 q]  <  e"59  <  \2~q, 


as  we  wanted  to  show,  and  the  theorem  follows  from  Lemma  12.13. 


□ 


12.5  Notes  and  Discussion 

Other  results  in  active  testing.  The  manuscript  [13]  contains  other  results  on  active 
testing  that  we  did  not  discuss  in  this  chapter.  Notably,  it  is  shown  there  that  the  property 


138 


of  being  a  union  of  d  intervals  can  be  tested  with  a  constant  number  of  queries  in  the  active 
testing  model.  This  is  in  contrast  to  the  problem  of  learning  unions  of  d  intervals,  which 
requires  Q(d)  queries  in  the  active  learning  model. 

The  active  testing  results  also  show  that  it  is  possible  to  test  common  assumptions 
from  the  semi-supervised  learning  model  with  a  constant  number  of  queries.  In  particular, 
testing  cluster  assumptions  and  margin  assumptions  can  both  be  done  with  a  constant 
number  of  queries.  For  the  details  and  more  background,  see  [13]. 


Active  testing  vs.  distribution-free  testing.  The  active  testing  model  that  we  introduce 
in  this  chapter  is  not  the  first  alternative  model  of  property  testing  that  was  proposed  to 
better  mirror  realistic  learning  models.  Notably,  Halevy  and  Kushilevitz  [63]  introduced  a 
distribution-free  model  of  property  testing  that  has  since  been  studied  extensively  [61,  62, 
56,  48].  In  the  distribution- free  model,  the  tester  can  sample  inputs  from  some  unknown 
distribution  and  can  query  the  target  function  on  any  input  of  its  choosing.  It  must  then 
distinguish  between  the  case  where  /  e  V  from  the  case  where  /  is  far  from  the  property 
over  the  (unknown)  distribution. 

If  we  consider  distributions  with  support  size  that  is  polynomial  in  the  dimension  of  the 
function  being  tested,  the  distribution-free  testing  model  appears  to  be  similar  to  our  active 
testing  model.  Indeed,  in  this  case  the  distance  of  the  input  function  to  a  fixed  property 
V  depends  only  on  the  values  of  the  function  on  inputs  in  the  support  of  the  distribution. 
So  intuitively  it  seems  that  the  tester  might  as  well  only  query  the  value  of  the  function  on 
inputs  in  the  support  of  the  distribution.  This  intuition  is  wrong:  in  fact,  most  testers  in  the 
distribution-free  model  strongly  rely  on  the  ability  to  query  any  input  of  their  choosing. 
As  a  result,  the  task  of  testing  a  property  in  the  active  model  is  very  different  from  the  task 
of  testing  the  same  property  in  the  distribution-free  model  and,  indeed,  many  properties  of 
boolean  functions  have  very  different  query  complexities  in  the  two  models. 


139 


140 


Bibliography 


[1]  Jose  A.  Adell  and  Pedro  Jodra.  Exact  Kolmogorov  and  total  variation  distances 
between  some  familiar  discrete  distributions.  Journal  of  Inequalities  and  Applica¬ 
tions,  2006.  4.4 

[2]  Rudolf  Ahlswede  and  Levon  H.  Khachatrian.  The  complete  intersection  theorem 
for  systems  of  finite  sets.  European  Journal  of  Combinatorics,  18:125-136,  1997. 
4.2.1 

[3]  Noga  Alon  and  Eric  Blais.  Testing  boolean  function  isomorphism.  Proc.  14th  Inter¬ 
national  Workshop  on  Randomization  and  Approximation  Techniques  in  Computer 
Science,  pages  394-405,  2010.  (document) 

[4]  Noga  Alon,  Eric  Blais,  Sourav  Chakraborty,  David  Garcia  Soriano,  and  Arie  Mat- 
sliah.  Nearly  tight  bounds  for  testing  function  isomorphism,  2011.  Manuscript, 
(document),  9.3 

[5]  Noga  Alon,  Eldar  Fischer,  Ilan  Newman,  and  Asaf  Shapira.  A  combinatorial  char¬ 
acterization  of  the  testable  graph  properties:  It’s  all  about  regularity.  SIAM  Journal 
on  Computing,  39:143-167,  2009.  8 

[6]  Noga  Alon  and  Asaf  Shapira.  A  characterization  of  the  (natural)  graph  properties 
testable  with  one-sided  error.  In  Proc.  of  the  46th  IEEE  Symposium  on  Foundations 
of  Computer  Science,  pages  429-438,  2005.  8 

[7]  Noga  Alon  and  Asaf  Shapira.  Every  monotone  graph  property  is  testable.  In  Proc. 
of  the  46th  IEEE  Symposium  on  Foundations  of  Computer  Science,  pages  128-137, 
2005.  8 

[8]  Noga  Alon  and  Joel  H.  Spencer.  The  Probabilistic  Method.  Wiley,  third  edition, 
2008.  9.1 


141 


[9]  Miguel  A.  Arcones.  A  Bernstein-type  inequality  for  U-statistics  and  U-processes. 
Statistics  &  Probability  Letters ,  22(3):239  -  247,  1995.  4.1.3,  4.7 

[10]  R.  F.  Arnold  and  M.  A.  Harrison.  Algebraic  properties  of  symmetric  and  partially 
symmetric  functions.  IEEE  Transactions  on  Electronic  Computers,  EC-12:244- 
251,  1963.  6.4 

[11]  Tim  Austin  and  Terence  Tao.  Testability  and  repair  of  hereditary  hypergraph  prop¬ 
erties.  Random  Structures  and  Algorithms,  36:373-463,  2010.  8 

[12]  Laszlo  Babai,  Robert  Beals,  and  Pal  Takacsi-Nagy.  Symmetry  and  complexity.  In 
Proc.  of  the  24th  annual  ACM  symposium  on  Theory  of  computing,  pages  438-449, 
1992.  6.4 

[13]  Maria-Florina  Balcan,  Eric  Blais,  Avrim  Blum,  and  Liu  Yang.  Active  testing,  2011. 
Manuscript,  (document),  12.4,  12.5 

[14]  Mihir  Bellare,  Don  Coppersmith,  Johan  Hastad,  Marcos  Kiwi,  and  Madhu  Su¬ 
dan.  Linearity  testing  in  characteristic  two.  IEEE  Trans,  on  Information  Theory, 
42(6):1781  -1795,  1996.  7 

[15]  Mihir  Bellare,  Oded  Goldreich,  and  Madhu  Sudan.  Free  bits,  PCPs  and  non- 
approximability  -  towards  tight  results.  SIAM  J.  Comput.,  27(3): 804-9 15,  1998. 
5,5,7 

[16]  Mihir  Bellare,  Shah  Goldwasser,  Carsten  Lund,  and  Alexander  Russell.  Efficient 
probabilistically  checkable  proofs  and  applications  to  approximations.  In  Proc.  of 
the  25th  Symposium  on  Theory  of  Computing,  pages  294-304,  1993.  5,  7 

[17]  Mihir  Bellare  and  Madhu  Sudan.  Improved  non-approximability  results.  In  Proc. 
of  the  26th  Symposium  on  Theory  of  Computing,  pages  184-193,  1994.  5,  7 

[18]  Amab  Bhattacharyya,  Elena  Grigorescu,  and  Asaf  Shapira.  A  unified  framework 
for  testing  linear-invariant  properties.  In  Proc.  51st  Annual  IEEE  Symposium  on 
Foundations  of  Computer  Science  (FOCS),  pages  478—487,  2010.  1.3,  8 

[19]  Eric  Blais.  Improved  bounds  for  testing  juntas.  In  Proc.  12th  Workshop  RANDOM, 
pages  317-330,  2008.  (document),  5.3 

[20]  Eric  Blais.  Testing  juntas  nearly  optimally.  In  Proc.  41st  Annual  ACM  Symposium 
on  Theory  of  Computing  (STOC),  pages  151-158,  2009.  (document),  5.3,  7.3,  12 


142 


[21]  Eric  Blais.  Testing  juntas:  a  brief  survey.  In  O.  Goldreich,  editor,  Property  Testing: 
Current  Research  and  Sun’eys,  pages  32-40.  Springer,  2010.  5 

[22]  Eric  Blais,  Joshua  Brody,  and  Kevin  Matulef.  Property  testing  lower  bounds  via 
communication  complexity.  In  Proc.  26th  Annual  IEEE  Conference  on  Computa¬ 
tional  Complexity  (CCC),  pages  210-220,  2011.  (document),  7.4 

[23]  Eric  Blais,  Joshua  Brody,  and  Kevin  Matulef.  Property  testing  lower  bounds  via 
communication  complexity.  Computational  Complexity,  2012.  To  appear,  (docu¬ 
ment),  7.4,  11.6 

[24]  Eric  Blais  and  Daniel  Kane.  Testing  properties  of  linear  functions,  2011. 
Manuscript,  (document),  7.4,  7.4 

[25]  Eric  Blais  and  Ryan  O’Donnell.  Lower  bounds  for  testing  function  isomorphism.  In 
Proc.  25th  Conference  on  Computational  Complexity  (CCC),  pages  235-246,  2010. 
(document),  10.3 

[26]  Eric  Blais,  Amit  Weinstein,  and  Yuichi  Yoshida.  Partially  symmetric  functions  are 
isomorphism-testable,  2011.  Manuscript,  (document),  8,  10.3 

[27]  Avrim  Blum.  Relevant  examples  and  relevant  features:  thoughts  from  computa¬ 
tional  learning  theory.  In  AAAI  Fcdl  Symposium  on  ‘Relevance’ ,  1994.  5 

[28]  Avrim  Blum.  Learning  a  function  of  r  relevant  variables.  In  Proc.  16th  Conference 
on  Computational  Learning  Theory,  pages  731-733,  2003.  5 

[29]  Avrim  Blum,  Lisa  Hellerstein,  and  Nick  Littlestone.  Learning  in  the  presence  of 
finitely  or  infinitely  many  irrelevant  attributes.  J.  of  Comp.  Syst.  Sci.,  50(l):32-40, 
1995.1.3.1,5,5.1 

[30]  Avrim  Blum  and  Pat  Langley.  Selection  of  relevant  features  and  examples  in  ma¬ 
chine  learning.  Artificial  Intelligence,  97(2):245-271,  1997.  5 

[31]  Manuel  Blum,  Michael  Luby,  and  Ronitt  Rubinfeld.  Self-testing/correcting  with  ap¬ 
plications  to  numerical  problems.  J.  Comput.  Syst.  Sci.,  47:549-595,  1993.  Earlier 
version  in  STOC’90.  7 

[32]  Jean  Bourgain.  On  the  distribution  of  the  Fourier  spectrum  of  boolean  functions. 
Israel  Journal  of  Mathematics,  131:269-276,  2002.  5 


143 


[33]  Jop  Briet,  Sourav  Chakraborty,  David  Garcia  Soriano,  and  Arie  Matsliah.  Mono¬ 
tonicity  testing  and  shortest-path  routing  on  the  cube.  In  Proc.  14th  International 
Workshop  on  Randomization  and  Approximation  Techniques  in  Computer  Science, 
2010.  11.4 

[34]  Joshua  Brody,  Kevin  Matulef,  and  Chenggang  Wu.  Lower  bounds  for  testing  com¬ 
putability  by  small-width  branching  programs.  In  Proc.  8th  Annual  Theory  and 
Applications  of  Models  of  Computation ,  2011.  11.6 

[35]  Harry  Buhrman,  Richard  Cleve,  and  Avi  Wigderson.  Quantum  vs.  classical  com¬ 
munication  and  computation.  In  Proc.  30th  Annual  ACM  Symposium  on  the  Theory 
of  Computing,  pages  63-68,  1998.  11.1.3,  11.6 

[36]  Samuel  H.  Caldwell.  Switching  Circuits  and  Logical  Design.  Wiley,  1958.  6.4 

[37]  Sourav  Chakraborty,  Eldar  Fischer,  David  Garcia  Soriano,  and  Arie  Mat¬ 
sliah.  Junto-symmetric  functions,  hypergraph  isomorphism,  and  crunching,  2012. 
Manuscript.  6.4,  6.4 

[38]  Sourav  Chakraborty,  David  Garcia  Soriano,  and  Arie  Matsliah,  2010.  Personal 
Communication.  7.3 

[39]  Sourav  Chakraborty,  David  Garcia  Soriano,  and  Arie  Matsliah.  Efficient  sample 
extractors  for  juntas  with  applications.  Automata,  Languages  and  Programming, 
pages  545-556,  2011.  5.3,  5.1,  7.3,  8.2 

[40]  Sourav  Chakraborty,  David  Garcia  Soriano,  and  Arie  Matsliah.  Nearly  tight  bounds 
for  testing  function  isomorphism.  In  Proc.  22nd  Annual  ACM-SIAM  Symposium  on 
Discrete  Algorithms  (SODA),  pages  1683-1702,  2011.  (document),  5.3,  7.3,  8,  8.1, 
9.3 

[41]  Hana  Chockler  and  Dan  Gutfreund.  A  lower  bound  for  testing  juntas.  Information 
Processing  Letters,  90(6):301-305,  2004.  5,  7.3 

[42]  P.  Clote  and  E.  Kranakis.  Boolean  functions,  invariance  groups,  and  parallel  com¬ 
plexity.  SIAM  J.  of  Computing,  20:553-590,  1991.  6.4 

[43]  S.R.  Das  and  C.L.  Sheng.  On  detecting  total  or  partial  symmetry  of  switching 
functions.  IEEE  Trans,  on  Computers,  C-20(3):352-355,  1971.  6.4 

[44]  Victor  H.  de  la  Pena  and  Evarist  Gine.  Decoupling:  From  Dependence  to  Indepen¬ 
dence.  Springer,  1999.  4.1.3 


144 


[45]  Ronald  de  Wolf.  A  Brief  Introduction  to  Fourier  Analysis  on  the  Boolean  Cube. 
Number  1  in  Graduate  Surveys.  Theory  of  Computing  Library,  2008.  2 

[46]  Ilias  Diakonikolas,  Homin  K.  Lee,  Kevin  Matulef,  Krzysztof  Onak,  Ronitt  Rubin- 
feld,  Rocco  A.  Servedio,  and  Andrew  Wan.  Testing  for  concise  representations.  In 
Proc.  48th  Symposium  on  Foundations  of  Computer  Science,  pages  549-558,  2007. 
5,  5.3,  7.3,  7.3.4,  7.7,  12 

[47]  Irit  Dinur  and  Shmuel  Safra.  On  the  hardness  of  approximating  minimum  vertex 
cover.  Annals  of  Mathematics,  162(l):439-485,  2005.  4.2.1,  4.10,  4.2.1,  5 

[48]  Elya  Dolev  and  Dana  Ron.  Distribution-free  testing  algorithms  for  monomials  with 
a  sublinear  number  of  queries.  In  Proceedings  of  the  13th  international  confer¬ 
ence  on  Approximation,  and  14  the  International  conference  on  Randomization,  and 
combinatorial  optimization:  algorithms  and  techniques,  APPROX/RANDOM’ 10, 
pages  531-544.  Springer- Verlag,  2010.  12.5 

[49]  Paul  Erdos,  Chao  Ko,  and  Richard  Rado.  Intersection  theorems  for  systems  of  finite 
sets.  The  Quarterly  Journal  of  Mathematics,  1 2(  1 ) : 3 1 3 — 3 20,  1961.  4.2.1 

[50]  W.  Feller.  An  introduction  to  probability  theory  and  its  applications,  volume  1. 
John  Wiley  &  Sons,  1968.  4.1 

[51]  W.  Feller.  A/7  introduction  to  probability  theory  and  its  applications,  volume  2. 
John  Wiley  &  Sons,  1971.  4.1 

[52]  Eldar  Fischer,  Guy  Kindler,  Dana  Ron,  Shmuel  Safra,  and  Alex  Samorodnitsky. 
Testing  juntas.  J.  Comput.  Syst.  Sci.,  68(4):753-787,  2004.  1.3.1,  1.3.2,  5,  5.1,  1,  7, 
8,  8,  8.1,9,  10,  12 

[53]  Peter  Frankl.  The  Erdos-Ko-Rado  theorem  is  true  for  n  =  ckt.  In  Combinatorics 
(Proc.  Fifth  Hungarian  Colloquium,  Keszthely),  volume  1,  pages  365-375,  1976. 
4.2.1 

[54]  Ehud  Friedgut.  Boolean  functions  with  low  average  sensitivity  depend  on  few  co¬ 
ordinates.  Combinatorica,  18:474-483,  1998.  5 

[55]  Ehud  Friedgut.  On  the  measure  of  intersecting  families,  uniqueness  and  stability. 
Combinatorica,  28(5):503— 528,  2008.  4.2.1,  4.10 

[56]  Dana  Glasner  and  Rocco  A.  Servedio.  Distribution-free  testing  lower  bound  for 
basic  boolean  functions.  Theory  of  Computing,  5(1):  191-216,  2009.  12.5 


145 


[57]  Oded  Goldreich.  On  testing  computability  by  small  width  OBDDs.  Proc.  14th  Inter¬ 
national  Workshop  on  Randomization  and  Approximation  Techniques  in  Computer 
Science,  pages  574-587,  2010.  1.3.1,  7,  7.3,  7.3.7 

[58]  Oded  Goldreich,  editor.  Property  Testing:  Current  Research  and  Surx’eys,  volume 
6390  of  LNCS.  Springer,  2010.  3 

[59]  Oded  Goldreich,  Shari  Goldwasser,  and  Dana  Ron.  Property  testing  and  its  con¬ 
nection  to  learning  and  approximation.  J.  of  the  ACM ,  45(4):653-750,  1998.  3.2, 
12.1 

[60]  Andras  Hajnal  and  Endre  Szemeredi.  Proof  of  a  conjecture  of  Paul  Erdos.  In 
Paul  Erdos,  Alfred  Renyi,  and  Vera  T.  Sos,  editors,  Combinatorial  Theory  and  its 
Applications  II,  volume  4,  pages  601-623.  Colloq.  Math.  Soc.  Janos  Bolyai,  1969. 
4.2.2,4.11,9.2 

[61]  Shirley  Halevy  and  Eyal  Kushilevitz.  Distribution-free  connectivity  testing.  In 
Approximation,  Randomization,  and  Combinatorial  Optimization.  Algorithms  and 
Techniques,  volume  3122  of  Lecture  Notes  in  Computer  Science,  pages  393-404. 
Springer  Berlin  /  Heidelberg,  2004.  12.5 

[62]  Shirley  Halevy  and  Eyal  Kushilevitz.  A  lower  bound  for  distribution-free  mono¬ 
tonicity  testing.  In  Approximation,  Randomization  and  Combinatorial  Optimiza¬ 
tion,  volume  3624  of  Lecture  Notes  in  Computer  Science,  pages  612-612.  Springer 
Berlin  /  Heidelberg,  2005.  12.5 

[63]  Shirley  Halevy  and  Eyal  Kushilevitz.  Distribution-free  property  testing.  SIAM  J. 
Comput. ,  37(4):  1107-1 138,  2007.  12.5 

[64]  Paul  R.  Halmos.  The  theory  of  unbiased  estimation.  Ann..  Math.  Statis.,  17(1):34- 
43,  1946.  4.1.3 

[65]  Johan  Hastad.  Some  optimal  inapproximability  results.  J.  ACM,  48:798-859,  2001. 
5 

[66]  Wassily  Hoeffding.  A  class  of  statistics  with  asymptotically  normal  distribution. 
Ann.  of  Math.  Stat.,  19(3):293-325,  1948.  4.1.3 

[67]  Tommy  R.  Jensen  and  Bjarne  Toft.  Graph  Coloring  Problems.  Wiley,  1994.  4.2.2 

[68]  Madhav  Jha  and  Sofya  Raskhodnikova.  Testing  and  reconstruction  of  lipschitz  func¬ 
tions  with  applications  to  data  privacy.  In  Proc.  52nd  Annual  IEEE  Symposium  on 
Foundations  of  Computer  Science,  pages  433-442,  2011.  1 1.6 


146 


[69]  Gil  Kalai.  A  Fourier-theoretic  perspective  on  the  Concordet  paradox  and  Arrows 
theorem.  Adv.  in  Applied  Math.,  29:412-426,  2002.  5 

[70]  Bala  Kalyanasundaram  and  Georg  Schnitger.  The  probabilistic  communication 
complexity  of  set  intersection.  SIAM  J.  Disc.  Math.,  5(4):547-557,  1992.  11,  11.4 

[71]  Tali  Kaufman,  Simon  Litsyn,  and  Ning  Xie.  Breaking  the  e-soundness  bound  of  the 
linearity  test  over  GF(2).  SIAM  J.  on  Computing,  39:1988-2003,  2010.  7 

[72]  Tali  Kaufman  and  Madhu  Sudan.  Algebraic  property  testing:  the  role  of  invariance. 
In  Proc.  40th  Annual  ACM  Symposium  on  Theory  of  Computing  (STOC),  pages 
403-412,  2008.  1.3,  6.4,  8 

[73]  M.  Keams  and  U.  Vazirani.  An  Introduction  to  Computational  Learning  Theory. 
MIT  Press,  1994.  3.2 

[74]  Michael  Kearns  and  Dana  Ron.  Testing  problems  with  sublearning  sample  com¬ 
plexity.  Journal  of  Computer  and  System  Sciences,  61(3):428  -  456,  2000.  12 

[75]  Subhash  Khot.  On  the  power  of  unique  2-prover  l-round  games.  In  Proc.  34th  ACM 
Symposium  on  the  Theory  of  Computing,  pages  767-775,  2002.  5 

[76]  Eyal  Kushilevitz  and  Noam  Nisan.  Communication  Complexity.  Cambridge  Uni¬ 
versity  Press,  1997.  11.1 

[77]  Kevin  Matulef,  Ryan  O’Donnell,  Ronitt  Rubinfeld,  and  Rocco  Servedio.  Testing 
halfspaces.  In  Proc.  20th  Symposium  on  Discrete  Algorithms,  2009.  12,  12.3,  12.10 

[78]  Michael  Molloy  and  Bruce  Reed.  Graph  Colouring  and  the  Probabilistic  Method. 
Springer,  2001.  4.2.2 

[79]  Elchanan  Mossel,  Ryan  O’Donnell,  and  Rocco  A.  Servedio.  Learning  functions  of 
k  relevant  variables.  J.  Comput.  Syst.  Sci.,  69(3):421-434,  2004.  5 

[80]  Ryan  O’Donnell.  Some  topics  in  analysis  of  boolean  function.  In  Proc.  40th  Annual 
ACM  Symposium  on  the  Theory  of  Computing,  pages  569-578,  2008.  2 

[81]  Ryan  O’Donnell.  Analysis  of  boolean  functions,  2012.  Available  at 

analysisofboole an functions. org.  2 

[82]  Ryan  O’Donnell  and  Karl  Wimmer.  KKL,  Kruskal-Katona,  and  monotone  nets.  In 
Proc.  50th  Annual  IEEE  Symposium  on  Foundations  of  Computer  Science,  pages 
725-734,  2009.  6.4 


147 


[83]  Ryan  O’Donnell  and  Karl  Wimmer.  Sharpness  of  KKL  on  Schreier  graphs,  2009. 
Manuscript.  6.4 

[84]  Michal  Pamas,  Dana  Ron,  and  Alex  Samorodnitsky.  Testing  basic  boolean  formu¬ 
lae.  SIAM  J.  Discret.  Math.,  16(l):20-46,  2002.  5,  5.3,  7 

[85]  Toniann  Pitassi  and  Rahul  Santhanam.  Effectively  polynomial  simulations.  In  Proc. 
First  Symposium  on  Innovations  in  Computer  Science ,  pages  370-382,  2010.  6.4 

[86]  Alexander  Razborov.  On  the  distributional  complexity  of  disjointness.  In  Proc. 
17thInternational  Colloquium  on  Automata,  Languages  and  Programming ,  pages 
249-253,1990.11,11.4 

[87]  Dana  Ron.  Algorithmic  and  analysis  techniques  in  property  testing.  Foundations 
and  Trends  in  Theoretical  Computer  Science,  5:73-205,  2009.  3 

[88]  Dana  Ron  and  Gilad  Tsur.  Testing  computability  by  width-two  obdds.  Theoretical 
Computer  Science,  420:64  -  79,  2012.  7.3.7 

[89]  Ronitt  Rubinfeld  and  Asaf  Shapira.  Sublinear  time  algorithms,  2011.  ECCC  TR1 1- 
013.  3 

[90]  Ronitt  Rubinfeld  and  Madhu  Sudan.  Robust  characterizations  of  polynomials  with 
applications  to  program  testing.  SIAM  J.  Comput.,  25(2):252-271,  1996.  1.2,  3.1 

[91]  C.  Seshadhri  and  Jan  Vondrak.  Is  submodularity  testable?  In  Proc.  2nd  Innovations 
in  Computer  Science,  2011.  1 1.6 

[92]  Claude  E.  Shannon.  The  synthesis  of  two-terminal  switching  circuits.  Bell  System 
Technical  Journal,  28(1):59— 98,  1949.  6.4 

[93]  Georgi  E.  Shilov.  Linear  Algebra.  Dover,  1977.  4.1.4 

[94]  Spario  Y.  T.  Soon.  Binomial  approximation  for  dependent  indicators.  Statistica 
Sinica,  6:703-714,  1996.  4.3 

[95]  Madhu  Sudan.  Invariance  in  property  testing.  In  O.  Goldreich,  editor,  Property 
Testing:  Current  Research  and  Surveys,  pages  211-227.  Springer,  2010.  1.3,  6.4,  8 

[96]  Gabor  Szego.  Orthogonal  Polynomials,  volume  23  of  Colloquium  Publications. 
AMS,  fourth  edition,  1975.  4.3.1 


148 


[97]  Jacobus  H.  van  Lint.  Introduction  to  Coding  Theory ,  volume  86  of  Graduate  Texts 
in  Mathematics.  Springer,  third  edition,  1999.  4.3,  4.3.1 

[98]  Roman  Vershynin.  Introduction  to  the  non-asymptotic  analysis  of  random  matrices. 
In  Y.  Eldar  and  G.  Kutyniok,  editors,  Compressed  Sensing:  Theory  and  Applica¬ 
tions,  chapter  5,  pages  210-268.  Cambridge  University  Press,  2012.  Available  at 

http  :  / / arxiv .  org/abs  / 1 0 1 1 . 3027.  4.1.4,  4.8,  4.1.4 

[99]  Richard  M.  Wilson.  The  exact  bound  in  the  Erdos-Ko-Rado  theorem.  Combinator- 
ica,  4(2-3):247-257,  1984.  4.2.1 

[100]  Andrew  C.  Yao.  Probabilistic  computations:  towards  a  unified  measure  of  com¬ 
plexity.  In  Proc.  18th  Sym.  on  Foundations  of  C output.  Sci.,  pages  222-221,  1977 . 
3.3,9.1,10.1,12.2 


149 


